Making an OTU table (mapping reads
The cluster_otus command performs
OTU clustering using the UPARSE-OTU algorithm.
Input is a
FASTA file containing quality filtered,
globally trimmed and
dereplicated reads from a marker gene amplicon
sequencing experiment, e.g. 16S or ITS. It is generally recommended that
singleton reads should be discarded. See
UPARSE pipeline for discussion of how to prepare reads before clustering.
must be trimmed to minimize terminal gaps
in alignments of closely related sequences. This is
because cluster_otus considers terminal gaps to be differences that reduce
sequence identity, unlike most other commands in USEARCH.
See global trimming for discussion.
Input sequence labels must have size annotations
giving the abundance of the unique sequence. Size annotations are generated by
the -sizeout option of clustering commands; typically
derep_fulllength is used.
The -minsize option can be used to specify a minimum abundance; for
example you can use -minsize 2 to discard singletons (this option requires
v8.1.1803 or later).
The -otu_radius_pct option specifies the OTU "radius"
as a percentage, i.e. the maximum difference between an OTU member sequence and the
representative sequence of that OTU. Default is 3.0, corresponding to a minimum identity of 97%.
It usually not recommended to
use an otu_radius_pct value greater than 3; see
UPARSE OTU radius for discussion.
The -otus option specifies a FASTA output file for the
OTU representative sequences. By default, OTUs labels are taken from the input
file, with size annotations stripped. The -relabel
option specifies a string that is used to re-label OTUs. If -relabel xxx is
specified, then the labels are xxx followed by 1, 2 ... up to the number of
OTUs. OTU identifiers in the labels is required for
making an OTU table using usearch_global
If the -sizeout option
is specified, then a size annotation is appended
to the OTU label giving the total number of sequences assigned to that OTU,
calculated as the sum of the size annotations of sequences assigned to that OTU.
If you use -sizeout, you should also use -sizein so that the input sequence size
annotations are counted.
The -uparseout option
specifies a tabbed text output file documenting how the input sequences were
The -uparsealnout option species a text file
containing a human-readable alignment of each query sequence to its
Parsimony score options
Alignment parameters and
heuristics are supported.
usearch -cluster_otus derep.fa -otus
otus.fa -uparseout out.up -relabel OTU -minsize 2
all_reads.fa -db otus.fa -strand plus -id 0.97 -otutabout otu_table.txt