Generating OTUs and ZOTUs

See also
OTU / denoising pipeline
Defining and interpreting OTUs

OTUs are sequences selected from the reads. The goal is to identify a set of of correct biological sequences. See defining and interpreting OTUs for discussion.

With 97% clustering, an OTU sequence should be at least 3% different from all other OTUs, and should be the most abundant sequences in its neighborhood. This is done by the cluster_otus command , which is an implementation of the UPARSE algorithm .

Denoising attempts to identify all correct biological sequences in the reads. This is done by the unoise3 command , which is an implementation of the UNOISE algorithm . A denoised sequence is called a "ZOTU" (zero-radius OTU).

ZOTUs are valid OTUs for diversity analysis etc., though the interpretation of the results is a bit different from the usual 97% OTUs. For example, it is expected that one species may have more than one ZOTU, and with 97% OTUs it is expected than an OTU may have more than one species.

The input data requirements of cluster_otus and unoise3 are the same: it should be a set of unique sequences which have been globally trimmed and quality filtered. It is therefore easy to generate both 97% OTUs and ZOTUs in the same pipeline, and I therefore suggest building an OTU table using both strategies.

Unique sequences are identified by the fastx_uniques command . The -sizeout option must be specified so that size annotations are included in the labels. I suggest using -relabel Uniq.

Example

usearch -fastx_uniques filtered.fa -fastaout uniques.fa -sizeout -relabel Uniq

usearch -cluster_otus uniques.fa -otus otus.fa -relabel Otu

usearch -unoise3 uniques.fa -zotus zotus.fa