Home Software Services About Contact usearch manual
derep_prefix command
 
See also
 
Global trimming

Dereplicates sequences using prefix matching.

Multithreading is supported.

It is generally recommended to specify ‑minseqlength L (discard sequences with length < L), especially if there are very short sequences that are not needed in the output. This can significantly improve speed.

The ‑output option is used to specify a FASTA file to contain the unique sequences. Sequences are sorted by decreasing cluster size.

The ‑uc output file is supported, but not other standard output files.

The ‑sizeout option may be used to specify that size annotations are added to the unique sequence labels. The ‑sizein option is not supported. See cluster sizes.

The ‑minuniquesize option may be used to set a minimum size for a cluster; unique sequences with smaller clusters are not included in the output file.

Reverse-complemented dereplication is not supported, so -strand both is not allowed.

The -topn N option specifies that only the largest N clusters will be written to the output file (v6.0.235 or later).

Example

usearch -derep_prefix input.fasta -output uniques.fasta -sizeout -minseqlength 64