Defining and interpreting OTUs
Interpreting diversity metrics
Recommended alpha and beta metrics
Comparing alpha diversity between groups
Statistical significance of diversity differences
Estimating microbial diversity
The diversity in a single sample (alpha diversity) is commonly measured using metrics such as the Shannon index and the Chao1 estimator, while the variation between pairs of samples (beta diversity) is measured using metrics such as the Jaccard distance or Bray-Curtis dissimilarity. Many such metrics, including Shannon, Chao1, Jaccard and Bray-Curtis, are calculated from OTU frequencies. Other metrics, e.g. unweighted UniFrac (called unifrac_binary in usearch) use presence / absence only, effectively considering a count to be one if it is any non-zero value.
OTU frequency does not correlate with species frequency
In fact, OTU frequencies have low correlation with species frequencies. This means, for example, that the most abundant OTU usually does not contain the most abundant species.
Cross-talk degrades presence / absence
Some diversity metrics use OTU presence / absence rather than frequencies. In usearch, such metrics are called "binary" because the count is considered to be zero or one. With amplicon reads, presence / absence cannot be reliably measured if samples are multiplexed because cross-talk often causes reads to be incorrectly assigned to a sample where the OTU is in fact absent. This problem is particularly severe if samples from different environments (e.g., human gut and mouse gut) are multiplexed into a single sequencing run.
Singleton counts are especially suspect
If you follow my recommended procedures, then you will pool reads for all samples and discard singleton unique sequences for making 97% OTUs and discard unique sequences with abundance <8 for making ZOTUs (denoising). Even so, many OTU table entries are often singletons (i.e., have value 1) for smaller OTUs because the total count is distributed over several samples. Small counts are more likely to be spurious, especially singletons, either because the OTU itself is spurious (e.g., an undetected chimera), or because of cross-talk.
Traditional diversity metrics are invalid or hard to interpret
Because of the issues described above, many diversity metrics are invalid, meaningless or hard to interpret when calculated from OTUs. Some alpha diversity metrics, including Chao1 and Robbins, explicitly use singleton counts or singleton frequencies in their formulas. If singleton unique reads or singleton OTUs are discarded, then these calculations are obviously invalid. Either way, singleton counts are suspect as described above, so the calculations are misleading or meaningless in practice. All beta diversity metrics use OTU frequencies or presence / absence, neither of which can be reliably determined from amplicon reads.