Commands > Clustering
Clusters sequences in a FASTA or
FASTQ file using a variant of the UCLUST algorithm designed to maximize
threshold must be specified using the ‑id option.
Sequences are processed in the order specified by the -sort
option, which may be other (the default), length or size. See
UCLUST sort order for discussion. If -sort length is specified, then sequences are
processed in order of decreasing length. This is most appropriate when fragments
are present together with full-length sequences. If -sort size is specified, then sequences are
processed in order of decreasing size annotation.
This can be useful for clustering of amplicon reads such as 16S or ITS tags,
though cluster_otus is usually recommended
for this task. If -sort other is used (the default), then the input
sequences are processed in the order they appear in the input file.
Reverse-complemented matching for nucleotide sequences can be specified by using
Size annotations may be
generated and/or propagated by using the -sizein
and/or -sizeout options.
See search flowchart
for an overview of searching in USEARCH commands. Searching is used to match input
sequences to existing cluster centroids.
Standard output files are supported. Cluster
centroids (representative sequences) are written to a FASTA file specified by
the -centroids option. Consensus sequences are written to a FASTA file specified
by -consout and multiple alignments are written
to filenames derived from the -msaout option. Note
that using -consout and -msaout may add significantly to the compute time and
memory required for clustering. You can specify a directory to contain one FASTA
file per cluster using the -clusters option.
usearch -cluster_fast query.fasta -id 0.9
-centroids nr.fasta -uc clusters.uc