cluster_fast command

See also
cluster_smallmem
cluster_otus
cluster_agg
cluster_aggd

Clusters sequences in a FASTA or FASTQ file using a variant of the UCLUST algorithm designed to maximize speed.

An identity threshold must be specified using the - id option .

Sequences are processed in the order specified by the - sort option , which may be other (the default), length or size. See UCLUST sort order for discussion. If -sort length is specified, then sequences are processed in order of decreasing length. This is most appropriate when fragments are present together with full-length sequences. If -sort size is specified, then sequences are processed in order of decreasing size annotation . This can be useful for clustering of amplicon reads such as 16S or ITS tags, though cluster_otus is usually recommended for this task. If -sort other is used (the default), then the input sequences are processed in the order they appear in the input file.

Reverse-complemented matching for nucleotide sequences can be specified by using -strand both.

Size annotations may be generated and/or propagated by using the - sizein and/or - sizeout options.

Output files
Standard output files are supported. Cluster centroids (representative sequences) are written to a FASTA file specified by the - centroids option. Consensus sequences are written to a FASTA file specified by - consout and multiple alignments are written to filenames derived from the - msaout option . Note that using -consout and -msaout may add significantly to the compute time and memory required for clustering. You can specify a directory to contain one FASTA file per cluster using the - clusters option.

Supported options
Accept options
Termination options
Indexing options
Masking options
Multithreading
Alignment parameters
Alignment heuristics

Example

usearch -cluster_fast query.fasta -id 0.9 -centroids nr.fasta -uc clusters.uc