Home Software Services About Contact usearch manual
cluster_smallmem command
Commands > Clustering
 See also
 
cluster_smallmem
  cluster_otus
  cluster_agg
  cluster_aggd

Clusters sequences in a FASTA or FASTQ file using a variant of the UCLUST algorithm designed to minimize memory use.

It's is the user's responsibility to sort the input sequences in an appropriate order before running cluster_smallmem; see UCLUST sort order for discussion. By default, input sequences are expected to be sorted by decreasing length. If some other sort order is used, the ‑sortedby option should be specified. Valid values are length (default), size and other. If -sortedby other is specified, then USEARCH does not assume or check for any particular order. See also sortbysize and sortbylength.

An identity threshold must be specified using the ‑id option.

Multithreading is not supported as this would require significant memory overhead.

By default, nucleotide matching is done on the forward strand only. For matching on both strands, use -strand both.

See search flowchart for an overview of searching in USEARCH commands. Searching is used to match input sequences to existing cluster centroids.

See also
  Standard output file options
 
Accept options
  Indexing options
  Termination options
  Masking options
  Alignment parameters
  Alignment heuristics

  Cluster sizes
  Memory requirements

Example

usearch -cluster_smallmem query.fasta -id 0.9 -centroids nr.fasta -uc clusters.uc