How to demultiplex
If you have Illumina reads with one
FASTQ file per sample, then demultiplexing has already been done for you.
If you have 454 reads with barcodes, or Illumina paired or unpaired reads
with i1 index reads, then you can use the
fastx_demux command to perform demultiplexing. If you have raw Illumina
dual index reads (i5 + i7 + r1 + r2), this is not currently supported in
usearch -- let me know and I will add the feature for you.
Several samples can be combined into a single
sequencer run by using "multiplexing" where a barcode sequencing identifying
the sample is inserted into the sequencing construct. Barcodes are also
called index sequences.
With Illumina sequencing, the barcode is usually positioned before the
sequencing primer so does not appear in the forward reads that contain the
biological sequence. Barcodes are obtained by making one (single-indexing) or two
(dual-indexing) additional reads which are sometimes called i1 for single
indexing and i5+i7 for dual indexing.
With other next-generation sequencers, the barcode sequence usually appears
at the beginning of the read, possibly after a machine-specific sequence
such as TCAG for 454.
With current Illumina software and standard library preparation protocols,
the demultiplexing is usually done for you and the basespace download
includes one FASTQ file for each sample; the index reads are not included.
However, it is sometimes useful to do the demultiplexing yourself, in which
case you can get "raw" i1, r1 and r2 reads.
With both 454 and
Illumina, reads are assigned to the wrong sample due to incorrect barcode
sequences at a surprisingly high rate. I call this problem
cross-talk. A suggested strategy for reducing
cross-talk is to use a sparse dual index scheme where most pairs of indexes
are not assigned to samples.