Commands > Clustering, FASTA/Q files, Reads
Find the set of unique sequences in an input file, also called
dereplication. Input is a FASTA or FASTQ file. Sequences are compared
letter-by-letter and must be identical over the full length of both
sequences (substrings do not match). Case is ignored, so an upper-case
letter matches a lower-case letter.
All 26 letters of the English
alphabet are treated in
the same way, so there is no concept of a biological alphabet or of wildcard
matching (unless strand -both is used).
Multithreading is supported.
The -fastaout option specifies a
FASTA output file for the unique sequences. Sequences are sorted by decreasing
The -fastqout option specifies a FASTQ output file for the unique sequences. Sequences are sorted by decreasing
The -tabbedout option specifies an output file in tabbed
text format. The fields are: 1. input label, 2. output label (this is the input
label of the first occurrence of the sequence, or the new label assigned to it
if the -relabel option is used), 3. cluster number (zero-based, so 0 is the
first unique sequence found, 1 is the second etc.), 4. member number in the
cluster (zero -based), 5. input label of the first occurrence of the sequence
(only if -relabel is specified).
The -uc output file is supported, but not
other standard output files.
The -sizeout option
specifies that size annotations should be added to the
output sequence labels.
The -relabel option specifies a string that is used to
re-label the dereplicated sequences. An integer is appended to the label.
-relabel Uniq will generate sequences labels Uniq1, Uniq2 ... etc. By default,
the label of the first occurrence of the sequence is used.
The -minuniquesize option sets a
minimum abundance. Unique sequences with a lower abundance are discarded.
Default is 1, which means that all unique sequences are output.
The -topn N option specifies that only the first N
sequences in order of decreasing abundance will be written to the output file.
Reverse-complemented matching for nucleotide sequences is supported by
specifying -strand both.
usearch -fastx_uniques input.fasta -fastaout uniques.fasta -sizeout