Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.



Dereplication for OTU clustering

See also
OTU / denoising pipeline

The input sequences to cluster_otus or unoise3 must be a set of unique sequences sorted in order of decreasing abundance with size annotations in the labels.

The fastx_uniques command can be to find the unique sequences and add the size annotations.

Finding unique sequences is sometimes called "dereplication".

 I suggest you use -relabel Uniq so that the unique sequences are labeled Uniq1, Uniq2 and so on. The input to fastx_uniques should be the reads after any quality filtering or length trimming.

It is critically important that sequences are quality filtered and globally trimmed before dereplication.

Samples should be pooled before dereplication
I recommend pooling samples, i.e. concatenating reads for all samples that were sequenced in the same run. This is important for getting the best detection of chimeras and cross-talk, and for getting the best sensitivity to low-abundance sequences that could be lost if individual samples or subsets of samples are clustered separately. See pooling samples for discussion.