Clusters sequences in a FASTA or FASTQ file using a variant of the UCLUST algorithm designed to maximize speed.
An identity threshold must be specified using the ‑id option.
Sequences are processed in the order specified by the -sort option, which may be other (the default), length or size. See UCLUST sort order for discussion. If -sort length is specified, then sequences are processed in order of decreasing length. This is most appropriate when fragments are present together with full-length sequences. If -sort size is specified, then sequences are processed in order of decreasing size annotation. This can be useful for clustering of amplicon reads such as 16S or ITS tags, though cluster_otus is usually recommended for this task. If -sort other is used (the default), then the input sequences are processed in the order they appear in the input file.
Reverse-complemented matching for nucleotide sequences can be specified by using -strand both.
Size annotations may be generated and/or propagated by using the -sizein and/or -sizeout options.
See search flowchart for an overview of searching in USEARCH commands. Searching is used to match input sequences to existing cluster centroids.
Standard output files are supported. Cluster centroids (representative sequences) are written to a FASTA file specified by the -centroids option. Consensus sequences are written to a FASTA file specified by -consout and multiple alignments are written to filenames derived from the -msaout option. Note that using -consout and -msaout may add significantly to the compute time and memory required for clustering. You can specify a directory to contain one FASTA file per cluster using the -clusters option.
usearch -cluster_fast query.fasta -id 0.9
-centroids nr.fasta -uc clusters.uc