Beta diversity metrics

A beta diversity metric compares the OTU abundances (counts or frequencies) in two samples by calculating a single number designed to indicate how similar or different the samples are. Beta diversity metrics are calculated using the beta_div command .

Metric names that end with _binary are calculated based on presence or absence alone. The numerical value of the abundance is not considered; the calculation is the same except that abundance is considered to be one if the OTU is present, zero otherwise. If the name does not end with _binary then the abundance is the count (number of reads). Note that because of cross-talk , presence or absence cannot be reliably established for low-abundance OTUs, so binary metrics are generally not recommended.

For consistency and simplicity, all the supported metrics are dissimilarity measures, meaning that they are zero when samples are identical and have larger values when the samples are different. This type of measure is sometimes called a distance metric, but most of these are not distances in the strict mathematical sense because they do not satisfy the triangle inequality.

In usearch, beta diversities are always differences measures, not similarity measures, so increasing values indicate lower similarity and increasing distance. For distance measures D that ranges between zero and one there is always an equivalant similarity measure S defined by S = 1 - D , for example (Jaccard similarity) = 1 - (Jaccard distance). You can easily convert between distance and similarity measures in a spreadsheet program such as Excel.

Metrics with a range of 0 .. 1 can be used for sample clustering, i.e. to generate a tree in which leaves are samples and more similar samples are closer together. The Euclidean and Manhatten distance metrics can take arbitrarily large values so are not appropriate for clustering.

Metric	Max value	Cluster	Description
bray_curtis	1	Y	Bray-Curtis
bray_curtis_binary	1	Y	Bray-Curtis
euclidean	(no maximum)	N	Euclidean distance
jaccard	1	Y	Jaccard coefficient
jaccard_binary	1	Y	Jaccard coefficent
manhatten	(no maximum)	N	Manhatten distance
unifrac	1	Y	UniFrac
unifrac_binary	1	Y	UniFrac