cluster_otus command

See also
UPARSE pipeline
OTU benchmark results
Making an OTU table (mapping reads to OTUs)

The cluster_otus command performs OTU clustering using the UPARSE-OTU algorithm.

Input is a FASTA file containing quality filtered, globally trimmed and dereplicated reads from a marker gene amplicon sequencing experiment, e.g. 16S or ITS. It is generally recommended that singleton reads should be discarded. See UPARSE pipeline for discussion of how to prepare reads before clustering.

Input sequences must be trimmed to minimize terminal gaps in alignments of closely related sequences. This is critically important because cluster_otus considers terminal gaps to be differences that reduce sequence identity, unlike most other commands in USEARCH. See global trimming for discussion.

Input sequence labels must have size annotations giving the abundance of the unique sequence. Size annotations are generated by the -sizeout option of clustering commands; typically derep_fulllength is used.

The -minsize option can be used to specify a minimum abundance; for example you can use -minsize 2 to discard singletons (this option requires v8.1.1803 or later).

The -otu_radius_pct option specifies the OTU "radius" as a percentage, i.e. the maximum difference between an OTU member sequence and the representative sequence of that OTU. Default is 3.0, corresponding to a minimum identity of 97%. It usually not recommended to use an otu_radius_pct value greater than 3; see UPARSE OTU radius for discussion.

The -otus option specifies a FASTA output file for the OTU representative sequences. By default, OTUs labels are taken from the input file, with size annotations stripped. The -relabel option specifies a string that is used to re-label OTUs. If -relabel xxx is specified, then the labels are xxx followed by 1, 2 ... up to the number of OTUs. OTU identifiers in the labels is required for making an OTU table using usearch_global

If the -sizeout option is specified, then a size annotation is appended to the OTU label giving the total number of sequences assigned to that OTU, calculated as the sum of the size annotations of sequences assigned to that OTU. If you use -sizeout, you should also use -sizein so that the input sequence size annotations are counted.

The -uparseout option specifies a tabbed text output file documenting how the input sequences were classified.

The -uparsealnout option species a text file containing a human-readable alignment of each query sequence to its UPARSE-REF model.

Parsimony score options are supported.

Alignment parameters and heuristics are supported.

Example

usearch -cluster_otus derep.fa -otus otus.fa -uparseout out.up -relabel OTU -minsize 2

usearch -usearch_global all_reads.fa -db otus.fa -strand plus -id 0.97 -otutabout otu_table.txt