cluster_otus command

See also
UPARSE pipeline
UPARSE commands
OTU benchmark results
Mapping reads to OTUs

The cluster_otus command performs OTU clustering using the UPARSE-OTU algorithm.

Input is a FASTA file containing quality filtered and globally trimmed reads from a marker gene amplicon sequencing experiment, e.g. 16S or ITS. Paired reads must be merged before clustering. It is generally recommended that singleton reads should be discarded before clustering to minimize spurious OTUs.

Input sequences must be trimmed to minimize terminal gaps in alignments of closely related sequences. This is critically important because cluster_otus considers terminal gaps to be differences that reduce sequence identity, unlike most other commands in USEARCH. See global trimming for discussion.

Input sequence labels must have size annotations giving the abundance of the unique sequence.

The -otu_radius_pct option specifies the OTU "radius" as a percentage, i.e. the maximum difference between an OTU member sequence and the representative sequence of that OTU. Default is 3.0, corresponding to a minimum identity of 97%. It usually not recommended to use an otu_radius_pct value greater than 3; see UPARSE OTU radius for discussion. I am working on a modified version of the algorithm that will enable larger radius values, contact me if you are interested in using this feature.

The -otus option specifies a FASTA output file for the OTU representative sequences. By default, OTUs labels are taken from the input file, with size annotations stripped. The -relabel option specifies a string that is used to re-label OTUs. If -relabel xxx is specified, then the labels are xxx followed by 1, 2 ... up to the number of OTUs.

If the -sizeout option is specified, then a size annotation is appended to the OTU label giving the total number of sequences assigned to that OTU, calculated as the sum of the size annotations of sequences assigned to that OTU. If you use -sizeout, you should also use -sizein so that the input sequence size annotations are counted.

The -uparseout option specifies a tabbed text output file documenting how the input sequences were classified..

Parsimony score options are supported.

Alignment parameters and heuristics are supported.

Example

usearch -cluster_otus derep2.fa -otus otus.fa -uparseout out.up -relabel OTU_ -sizein -sizeout