Home Software Services About Contact     
 
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

24-Nov-2016
UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.

 

USEARCH v11

FAQ: Should you use UPARSE or UNOISE?

There are two different ways to make OTUs: 97% clustering and denoising.

The UPARSE algorithm makes 97% OTUs.

The UNOISE algorithm does denoising, i.e. error-correction.

If UPARSE works perfectly, it will give you a subset of the correct biological sequences in your reads such that no two sequences are >97% identical to each other. It is implemented in the cluster_otus command.

If UNOISE works perfectly, it will give you all the correct biological sequences in the reads. It is implemented in the unoise3 command.

(Of course, no algorithm is perfect so you should expect some mistakes).

The pipeline for running UPARSE and UPARSE is essentially the same,, the only difference is whether you run cluster_otus or unoise3 as the clustering step.

Once you have made an OTU table, you can proceed with diversity analysis etc. in the same way, regardless of whether you used UPARSE or UNOISE.

Which should you choose? I suggest you try both. If a biological conclusion is different, then you should worry that neither result is trustworthy and try to understand why this happens. If both methods agree, that tends to confirm the result.

Pros and cons
As of the time of writing in 2017, most published papers use 97% clustering, so this will be easier to explain to your PI and to referees. The main disadvantage of 97% clustering is that you discard some correct biological sequences that are present in your reads. If these represent strains or species with a different phenotype, then you lose relevant information and the corresponding reads will be lumped together into one OTU that contains multiple phenotypes.

The main disadvantage of denoising is that species often have variations between individuals and paralogs that are not 100% identical. Another disadvantage is that more low-abundance sequences are lost: with UPARSE, singleton uniques are discarded, but with UNOISE uniques with abundance <8 are discarded. For typical studies, this shouldn't make much difference because samples should be pooled, so a sequence with abundance <8 will probably be a singleton in a few samples and singletons in the OTU table should not be considered significant because they could be spurious with any method.

The main advantage of denoising is that you get better resolution by keeping all the biological sequences. If you have intra-species variations in the region that you sequenced, then you will get two or more OTUs for one species. For most purposes, this really doesn't matter -- it might even be better if this enables you to detect strains with different phenotypes -- so if I have to recommend one method, then I would recommend denoising.