Generating OTUs and ZOTUs
Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.



Generating OTUs and ZOTUs

See also
OTU / denoising pipeline
  Defining and interpreting OTUs

OTUs are sequences selected from the reads. The goal is to identify a set of of correct biological sequences. See defining and interpreting OTUs for discussion.

With 97% clustering, an OTU sequence should be at least 3% different from all other OTUs, and should be the most abundant sequences in its neighborhood. This is done by the cluster_otus command, which is an implementation of the UPARSE algorithm.

Denoising attempts to identify all correct biological sequences in the reads. This is done by the unoise3 command, which is an implementation of the UNOISE algorithm. A denoised sequence is called a "ZOTU" (zero-radius OTU).

ZOTUs are valid OTUs for diversity analysis etc., though the interpretation of the results is a bit different from the usual 97% OTUs. For example, it is expected that one species may have more than one ZOTU, and with 97% OTUs it is expected than an OTU may have more than one species.

The input data requirements of cluster_otus and unoise3 are the same: it should be a set of unique sequences which have been globally trimmed and quality filtered. It is therefore easy to generate both 97% OTUs and ZOTUs in the same pipeline, and I therefore suggest building an OTU table using both strategies.

Unique sequences are identified by the fastx_uniques command. The -sizeout option must be specified so that size annotations are included in the labels. I suggest using -relabel Uniq.


usearch -fastx_uniques filtered.fa -fastaout uniques.fa -sizeout -relabel Uniq

usearch -cluster_otus uniques.fa -otus otus.fa -relabel Otu

usearch -unoise3 uniques.fa -zotus zotus.fa