OTU / denoising pipeline
Defining and interpreting OTUs
OTUs are sequences selected from the reads. The goal is to identify a set of of correct biological sequences. See defining and interpreting OTUs for discussion.
With 97% clustering, an OTU sequence should be at least 3% different from all other OTUs, and should be the most abundant sequences in its neighborhood. This is done by the cluster_otus command, which is an implementation of the UPARSE algorithm.
Denoising attempts to identify all correct biological sequences in the reads. This is done by the unoise3 command, which is an implementation of the UNOISE algorithm. A denoised sequence is called a "ZOTU" (zero-radius OTU).
ZOTUs are valid OTUs for diversity analysis etc., though the interpretation of the results is a bit different from the usual 97% OTUs. For example, it is expected that one species may have more than one ZOTU, and with 97% OTUs it is expected than an OTU may have more than one species.
The input data requirements of cluster_otus and unoise3 are the same: it should be a set of unique sequences which have been globally trimmed and quality filtered. It is therefore easy to generate both 97% OTUs and ZOTUs in the same pipeline, and I therefore suggest building an OTU table using both strategies.
usearch -fastx_uniques filtered.fa -fastaout uniques.fa -sizeout -relabel Uniq
usearch -cluster_otus uniques.fa -otus otus.fa -relabel Otu
usearch -unoise3 uniques.fa -zotus zotus.fa