Home Software Services About Contact     
 
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

24-Nov-2016
UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.

 

USEARCH v11

OTU QC: chimeras

See also
  Quality control for OTU sequences

The cluster_otus and unoise3 commands have built-in filters for chimeras. These filters are not perfect, so there may be cases where non-chimeric OTUs were incorrectly discarded (false positives) or chimeric OTUs were not filtered (false negatives).

In the past, I have suggested using reference-based UCHIME as a post-processing step to filter chimeric OTUs. This is a bad idea with the current implementations of cluster_otus and unoise3 because the error rate of reference-based UCHIME is much higher than the error rates of the built-in de novo filters. Therefore, if you run uchime2_ref on your OTUs, the OTUs that are discarded are more likely to be false positives than true chimeras. The high accuracy claims of the original UCHIME paper were exaggerated because of unrealistic benchmark tests; this is explained in the UCHIME2 paper.

It turns out that it is impossible in principle to distinguish chimeras from correct sequences, even when there are no sequence errors and the reference database is complete. This is a very surprising, almost shocking, result which is reported in the UCHIME2 paper. The reason is "fake models", where a correct sequence can be constructed as a chimera from two other correct sequences. Chimeras can have identical sequences to valid genes, so it is impossible for an algorithm to distinguish the two cases from a sequence alone. Fake models are common in practice, hence the problem.
 
So, what should you do? I would suggest running uchime2_ref with the largest possible database (which would be SILVA for 16S). Review the results with -mode balanced and -mode high_confidence. Usually, I find that -mode high_confidence is more useful because -mode balanced gives too many questionable predictions. With high_confidence, you will probably see a small number of quite convincing chimeric alignments. It is then a judgement call whether you think these are false positives due to fake models or true chimeras. It's anyone's guess, because it is impossible to distinguish these two cases. If you get a lot of convincing alignments and you think this may be due to problems with the built-in filter in cluster_otus or unoise3, then by all means send them to me for review.