Generating OTUs and ZOTUs
OTU / denoising
Defining and interpreting OTUs
Adding sizes to OTU labels
In my approach, OTUs are sequences selected from the reads. The goal is to
identify a set of of correct biological sequences. See
defining and interpreting OTUs for discussion.
With 97% clustering, an OTU sequence should be at least 3% different from
all other OTUs, and should be the most abundant sequences in its
neighborhood. This is done by the
cluster_otus command, which is an implementation of the
Denoising attempts to identify all correct biological sequences in the
reads. This is done by the unoise3 command,
which is an implementation of the UNOISE
algorithm. A denoised sequence is called a "ZOTU" (zero-radius OTU).
ZOTUs are valid OTUs for diversity analysis etc., though the interpretation
of the results is a bit different from the usual 97% OTUs. For example, it is expected that one
species may have more than one ZOTU, and with 97% OTUs it is expected than
an OTU may have more than one species.
The data requirements of cluster_otus and
unoise3 are the same: the input should be a
set of unique sequences which have been trimmed and quality filtered. It is
therefore easy to generate both 97% OTUs and ZOTUs in the same pipeline, and
I therefore suggest building an OTU table using both strategies.
Unique sequences are identified by the
fastx_uniques command. The -sizeout option must be specified so that
size annotations are included in the labels. I
suggest using -relabel Uniq.
If you want size=nnn; annotations in the OTU or ZOTU sequence labels, see
usearch -fastx_uniques filtered.fa -fastaout uniques.fa -sizeout -relabel
usearch -cluster_otus uniques.fa -otus otus.fa -relabel Otu
usearch -unoise3 uniques.fa -zotus zotus.fa -relabel Zotu