Home Software Services About Contact usearch manual
Generating OTUs and ZOTUs

See also
 
OTU / denoising pipeline
  Defining and interpreting OTUs
  Adding sizes to OTU labels

In my approach, OTUs are sequences selected from the reads. The goal is to identify a set of of correct biological sequences. See defining and interpreting OTUs for discussion.

With 97% clustering, an OTU sequence should be at least 3% different from all other OTUs, and should be the most abundant sequences in its neighborhood. This is done by the cluster_otus command, which is an implementation of the UPARSE algorithm.

Denoising attempts to identify all correct biological sequences in the reads. This is done by the unoise3 command, which is an implementation of the UNOISE algorithm. A denoised sequence is called a "ZOTU" (zero-radius OTU).

ZOTUs are valid OTUs for diversity analysis etc., though the interpretation of the results is a bit different from the usual 97% OTUs. For example, it is expected that one species may have more than one ZOTU, and with 97% OTUs it is expected than an OTU may have more than one species.

The data requirements of cluster_otus and unoise3 are the same: the input should be a set of unique sequences which have been trimmed and quality filtered. It is therefore easy to generate both 97% OTUs and ZOTUs in the same pipeline, and I therefore suggest building an OTU table using both strategies.

Unique sequences are identified by the fastx_uniques command. The -sizeout option must be specified so that size annotations are included in the labels. I suggest using -relabel Uniq.

If you want size=nnn; annotations in the OTU or ZOTU sequence labels, see adding sizes.

Example

usearch -fastx_uniques filtered.fa -fastaout uniques.fa -sizeout -relabel Uniq

usearch -cluster_otus uniques.fa -otus otus.fa -relabel Otu

usearch -unoise3 uniques.fa -zotus zotus.fa -relabel Zotu