Commands > OTU analysis, Clustering, Chimeras
OTU / denoising pipeline
Tutorials with data, scripts, and excercises
with solutions (new Aug 2017)
Defining and interpeting OTUs
Making an OTU table (otutab command)
Should I use UPARSE or UNOISE?
The cluster_otus command performs
97% OTU clustering using the UPARSE-OTU algorithm.
Chimeras are filtered by this command. This chimera
filtering is much better than using UCHIME so I do not recommend using
reference-based chimera filtering as a post-processing step, except as a
manual check, because false positives are common.
For most purposes, I consider 97% OTU clustering
obsolete. It is better to use the unoise command
to recover the full set of biological sequences in the reads. These are also
valid OTUs; I call them "ZOTUs" for zero-radius OTUs, to emphasize this. See
defining and interpreting OTUs and the UNOISE paper for
Input to cluster_otus is a
FASTA file containing quality filtered,
globally trimmed and
dereplicated reads from a marker gene amplicon
sequencing experiment, e.g. 16S or ITS. It is generally recommended that
singleton reads should be discarded.
See OTU / denoising pipeline for discussion of how to prepare reads before clustering.
It is strongly recommended
that you follow the pipeline recommentations, otherwise the accuracy of the
OTUs will probably be compromised.
Input sequence labels must have size annotations
giving the abundance of the unique sequence. Size annotations are generated by
the -sizeout option of clustering commands; typically
fastx_uniques is used.
The -minsize option can be used to specify a minimum abundance.
Default value is 2, which discards singleton
The identity threshold is fixed at 97%. See
defining and intepreting OTUs for discussion. See
UPARSE OTU radius for making OTUs at different identities.
The -otus option specifies a FASTA output file for the
OTU representative sequences. By default, OTUs labels are taken from the input
file, with size annotations stripped.
option specifies a string that is used to re-label OTUs. If -relabel xxx is
specified, then the labels are xxx followed by 1, 2 ... up to the number of
OTUs. OTU identifiers in the labels is required for
making an OTU table with the otutab command.
The -uparseout option
specifies a tabbed text output file documenting how the input sequences were
The -uparsealnout option species a text file
containing a human-readable alignment of each query sequence to its
If you want size=nnn; annotations in the OTU or ZOTU
sequence labels, see adding sizes.
Parsimony score options
Alignment parameters and
heuristics are supported.
usearch -cluster_otus uniques.fa -otus otus.fa
-uparseout uparse.txt -relabel Otu