Demultiplexing

How to demultiplex
If you have Illumina reads with one FASTQ file per sample, then demultiplexing has already been done for you.

If you have 454 reads with barcodes, or Illumina paired or unpaired reads with i1 index reads, then you can use the fastx_demux command to perform demultiplexing.

If you have raw Illumina dual index reads (I1 + I2 + R1 + R2), you can hack a solution by using fastq_join to concatenate the index reads into a single sequence, then make barcodes FASTA file which matches the format (I1, NN.. spacer, reverse-complemented I2).

Background
Several samples can be combined into a single sequencer run by using "multiplexing" where a barcode sequencing identifying the sample is inserted into the sequencing construct. Barcodes are also called index sequences.

With Illumina sequencing, the barcode is usually positioned before the sequencing primer so does not appear in the forward reads that contain the biological sequence. Barcodes are obtained by making one (single-indexing) or two (dual-indexing) additional reads which are sometimes called i1 for single indexing and i5+i7 for dual indexing.

With other next-generation sequencers, the barcode sequence usually appears at the beginning of the read, possibly after a machine-specific sequence such as TCAG for 454.

With current Illumina software and standard library preparation protocols, the demultiplexing is usually done for you and the basespace download includes one FASTQ file for each sample; the index reads are not included. However, it is sometimes useful to do the demultiplexing yourself, in which case you can get "raw" i1, r1 and r2 reads.

With both 454 and Illumina, reads are assigned to the wrong sample due to incorrect barcode sequences at a surprisingly high rate. I call this problem cross-talk . A suggested strategy for reducing cross-talk is to use a sparse dual index scheme where most pairs of indexes are not assigned to samples.