Abundance rarefaction
Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.



Abundance rarefaction

See also
  alpha_div_rare command

Suppose you observe K different species, finding Nk individuals in the kth species. The sum N = N1+ N2+ ... + NK is the total number of observations. To calculate the rarefaction curve for (number of observed species) vs. (number of observations), we take random subsets with n =1, 2, ... N observations and count the number of species S(n) that are found in each subset. (More accurately, we should average over many different random subsets at each size n until the mean converges). Some species will disappear with fewer observations, and the curve will approach zero species as n approaches zero. It is not actually necessary to implement subsampling and average, S(n) can be calculated using this formula:


My colleague Henrik Flyvbjerg and I have developed the modified formula for S(n) when singletons are discarded; let me know if you are interested and I will send you the details.

Rarefaction for OTUs
If species with exactly one observation are ignored, then the above formula does not apply. Thus, if singleton reads are discarded, as recommended in the UPARSE pipeline, then you cannot use standard rarefaction software and the above formula does not give the correct result. If singletons are retained, then the formula is a reasonable approximation, but is not exactly correct because de novo OTUs are necessarily "unstable", meaning that a given pair of sequences may belong to the same OTU with one subset but in different OTUs in another subset.

With de novo OTUs, including those made by UPARSE, then strictly speaking, rarefaction curves must be generated by running the pipeline from scratch for each random subsample and noting the number of OTUs obtained.