Multiple-sample diversity metrics (beta diversity)
Single-sample (alpha) diversity
A single-sample diversity metric attempts to capture the intuitive notion of "diversity" by calculating a single number from one set of observations of individuals. The individuals must be assigned to a group, e.g., species or OTU. Here, I will call the groups "OTUs" and individuals "reads". The number of reads assigned to a given OTU is the abundance of the OTU.
A diversity index is a metric that characterizes the OTUs that were observed without extrapolating to consider rare OTUs that were not observed due to sampling. The simplest example is richness, which is the number of OTUs that were observed. More sophisticated diversity metrics consider abundances so that high-abundance OTUs are weighted differently from low-abundance OTUs.
A diversity estimator is a metric that attempts to extrapolate to account for rare OTUs that were missed due to sampling. Estimators make mathematical assumptions about the shape of the tail of the abundance distribution.
Richness (Wikipedia) is the simplest diversity index; it is just the number of OTUs.
The Simpson index (Wikipedia) is the probability that two individuals taken at random from the sample belong to the same OTU.
The Shannon index (Wikipedia) is also known as Shannon entropy, the Shannon-Wiener index and the Shannon-Weaver index. It is a fundamental quantity in information theory that can be interpreted as the amount of uncertainty inherent in the abundance distribution. If there are many OTUs with equal abundances, the entropy is maximized because it is hard to predict which OTU you would find by randomly picking a read. On the other hand, if all the reads belong to one large OTU, then the entropy is minimized because there is no uncertainly about which OTU you will pick.
Jost index (effective number of species)
The Jost index calculates an effective number of OTUs. The index has a parameter (q) which determines how abundance is weighted.
The Chao1 estimator is popular, but in my opinion it should not be used with OTUs obtained by clustering NGS reads. It is calculated as Chao1 = N + S / (2 D), where N = nr OTUs, S = nr singletons and D = nr doublets (OTUs with abundance 2). The problem with this metric is that spurious OTUs due to sequencing and PCR errors are strongly biased towards low abundance, so we expect S and D to be overestimated, but we don't know by how much. See discussion of singletons.