fasta_diversity command

The fasta_diversity command reports several single-sample (alpha) diversity metrics calculated from the number of sequences and size annotations (abundances) in a FASTA file. Typically, the FASTA file is the output from cluster_otus command using the -sizein and -sizeout options.

The reported metrics include:

RIchness index (number of OTUs).

Jost index (effective number of OTUs)
with q = 0, 1, 2, 3, and 4..

Shannon entropy index.

Simpson concentration index.

Gini-Simpson index.

HCDT entropy index.

Renyi entropy index.

A rarefaction curve is calculated for each metric by
randomized sub-sampling in 1% intervals from zero to 100% of the reads (i.e.,
from zero to 100% of the total abundance given by the size annotations). If a
given metric reaches a horizontal asymptote, this implies that sufficient
observations have been made to give a reasonable estimate. If the metric does
not converge, this implies that more observations are needed to get a good
estimate. For each sub-sample size, metrics are averaged over the number of
repetitions specified by the -iters option (default 32).

Note that when singletons are excluded, as recommended for UPARSE, it can be more difficult to tell whether a horizontal asymptote has been reached. In general, the question of how best to calculate and evaluate diversity metrics when singletons has been excluded is an open research problem. However, it seems clear to me that it is much worse to keep singletons and using metrics such as Chao1 that are based on singleton frequencies because spurious OTUs are strongly biased to occur with low abundances, especially singletons. See this discussion of singletons.

The metrics are written to a tabbed text file, the name is specified by the -output option. The first field is Pct, which gives the size of the sub-sample.

**Example
**

usearch -fasta_diversity otus.fa -output diversity.txt -iters 100