derep_fulllength command

See also
Global trimming

Dereplicates sequences using full-length matching.

The ‑minseqlength option can be used to specify the minimum sequence length to be included in the output.

The ‑fastaout option is used to specify a FASTA file to contain the unique sequences. Sequences are sorted by decreasing cluster size.

The ‑fastqout option is used to specify a FASTQ file to contain the unique sequences. Sequences are sorted by decreasing cluster size.

The ‑uc output file is supported, but not other standard output files.

The ‑sizeout option may be used to specify that size annotations should be added to the unique sequence labels. See cluster sizes.

The -relabel option specifies a string that is used to re-label the dereplicated sequences. An integer is appended to the label. E.g., -relabel D_ will generate sequences labels D_1, D_2 ... etc.

The ‑minuniquesize option may be used to set a minimum size for a cluster; unique sequences with smaller clusters are not included in the output file.

Reverse-complemented dereplication is supported by specifying -strand both.

The -topn N option specifies that only the largest N clusters will be written to the output file.

Example

usearch -derep_fulllength input.fasta -fastaout uniques.fasta -sizeout -minseqlength 64