Single-sample diversity metrics (alpha diversity)

**See also
**fasta_diversity command

Multiple-sample diversity metrics (beta diversity)

**Single-sample (alpha) diversity
**A single-sample diversity metric attempts to capture the intuitive notion of
"diversity" by calculating a single number from one set of observations of
individuals. The individuals must be assigned to a group, e.g., species or OTU.
Here, I will call the groups "OTUs" and individuals "reads". The number of reads
assigned to a given OTU is the

**Diversity index
**A diversity index is a metric that characterizes the OTUs that were observed
without extrapolating to consider rare OTUs that were not observed due to
sampling. The simplest example is richness, which is the number of OTUs that
were observed. More sophisticated diversity metrics consider abundances so that
high-abundance OTUs are weighted differently from low-abundance OTUs.

**Diversity estimator
**A diversity estimator is a metric that attempts to extrapolate to account
for rare OTUs that were missed due to sampling. Estimators make mathematical
assumptions about the shape of the tail of the abundance distribution.

**Richness index
**Richness (Wikipedia)
is the simplest diversity index; it is just the number of OTUs.

**Simpson index
**The Simpson index (Wikipedia)
is the probability that two individuals taken at random from the sample belong
to the same OTU.

**Shannon index
**The Shannon index (Wikipedia)
is also known as Shannon entropy, the Shannon-Wiener index and the
Shannon-Weaver index. It is a fundamental quantity in information theory that
can be interpreted as the amount of uncertainty inherent in the abundance
distribution. If there are many OTUs with equal abundances, the entropy is
maximized because it is hard to predict which OTU you would find by randomly
picking a read. On the other hand, if all the reads belong to one large OTU,
then the entropy is minimized because there is no uncertainly about which OTU
you will pick.

**Jost index (effective number of species)
**The Jost index calculates an effective number
of OTUs. The index has a parameter (

**Chao1 estimator
**The Chao1 estimator is popular, but in my opinion it should not be used with
OTUs obtained by clustering NGS reads. It is calculated as Chao1 =