Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.




See also
  OTU / denoising pipeline
  Read preparation
  fastx_demux command

How to demultiplex
If you have Illumina reads with one FASTQ file per sample, then demultiplexing has already been done for you.

If you have 454 reads with barcodes, or Illumina paired or unpaired reads with i1 index reads, then you can use the fastx_demux command to perform demultiplexing.  

If you have raw Illumina dual index reads (I1 + I2 + R1 + R2), you can hack a solution by using fastq_join to concatenate the index reads into a single sequence, then make barcodes FASTA file which matches the format (I1, NN.. spacer, reverse-complemented I2).

Several samples can be combined into a single sequencer run by using "multiplexing" where a barcode sequencing identifying the sample is inserted into the sequencing construct. Barcodes are also called index sequences.

With Illumina sequencing, the barcode is usually positioned before the sequencing primer so does not appear in the forward reads that contain the biological sequence. Barcodes are obtained by making one (single-indexing) or two (dual-indexing) additional reads which are sometimes called i1 for single indexing and i5+i7 for dual indexing.

With other next-generation sequencers, the barcode sequence usually appears at the beginning of the read, possibly after a machine-specific sequence such as TCAG for 454.

With current Illumina software and standard library preparation protocols, the demultiplexing is usually done for you and the basespace download includes one FASTQ file for each sample; the index reads are not included. However, it is sometimes useful to do the demultiplexing yourself, in which case you can get "raw" i1, r1 and r2 reads.

With both 454 and Illumina, reads are assigned to the wrong sample due to incorrect barcode sequences at a surprisingly high rate. I call this problem cross-talk. A suggested strategy for reducing cross-talk is to use a sparse dual index scheme where most pairs of indexes are not assigned to samples.