The fasta_diversity command reports several
single-sample (alpha) diversity metrics
calculated from the number of sequences and size
annotations (abundances) in a FASTA file. Typically, the FASTA file is the
output from cluster_otus command using the -sizein
and -sizeout options.
The reported metrics include:
RIchness index (number of OTUs).
Jost index (effective number of OTUs)
with q = 0, 1, 2, 3, and 4..
Shannon entropy index.
Simpson concentration index.
HCDT entropy index.
Renyi entropy index.
A rarefaction curve is calculated for each metric by
randomized sub-sampling in 1% intervals from zero to 100% of the reads (i.e.,
from zero to 100% of the total abundance given by the size annotations). If a
given metric reaches a horizontal asymptote, this implies that sufficient
observations have been made to give a reasonable estimate. If the metric does
not converge, this implies that more observations are needed to get a good
estimate. For each sub-sample size, metrics are averaged over the number of
repetitions specified by the -iters option (default 32).
Note that when singletons are excluded, as recommended for
UPARSE, it can be more difficult to tell whether a horizontal asymptote has been
reached. In general, the question of how best to calculate and evaluate
diversity metrics when singletons has been excluded is an open research problem.
However, it seems clear to me that it is much worse to keep singletons and using
metrics such as Chao1 that are based on singleton frequencies because spurious
OTUs are strongly biased to occur with low abundances, especially singletons.
See this discussion of singletons.
The metrics are written to a tabbed text file, the name is
specified by the -output option. The first field is Pct, which gives the size of
usearch -fasta_diversity otus.fa -output
diversity.txt -iters 100