fasta_rarify command

**See also**

Rarefaction

** **Abundance rarefaction

fastx_subsample command

The fasta_rarify command computes a rarefaction curve from the size annotations in a FASTA file. This is a fast, approximate method for generating an abundance curve for OTUs which is especially useful when singletons are discarded, as recommended in the UPARSE pipeline. See abundance rarefaction for the correct, but more computationally expensive, procedure.

The -mingroupsize option specifies the minimum abundance of a unique sequence to be counted as an observation. The default of 1 means that all sequences are counted, so the results should be close to those predicted by the "standard" rarefaction formula. Set mingroupsize to 2 to discard singletons before counting the number of uniques. The -iters option specifies the number of iterations to try for each subset size (0, 1%, 2% ... 100% of the unique reads in the input file), default is 32 iterations.

Output is in tabbed text format to a filename given by the -output option. Fields are:

1. Percentage of sequences for subset.

2. Size of subset (total number of sequences).

3. Average number of unique sequences.

The output file can readily be imported into a spreadsheet or other software that can generate graphs.

**Example**

usearch -fasta_rarify otus.fa -mingroupsize 2 -iters
100 -output rare.txt