Home Software Services About Contact usearch manual
OTU QC: chimeras

See also
  Quality control for OTU sequences

The cluster_otus and unoise3 commands have built-in filters for chimeras. These filters are not perfect, so there may be cases where non-chimeric OTUs were incorrectly discarded (false positives) or chimeric OTUs were not filtered (false negatives).

In the past, I have suggested using reference-based UCHIME as a post-processing step to filter chimeric OTUs. This is a bad idea with the current implementations of cluster_otus and unoise3 because the error rate of reference-based UCHIME is much higher than the error rates of the built-in de novo filters. Therefore, if you run uchime2_ref on your OTUs, the OTUs that are discarded are more likely to be false positives than true chimeras. The high accuracy claims of the original UCHIME paper were exaggerated because of unrealistic benchmark tests; this is explained in the UCHIME2 paper.

It turns out that it is impossible in principle to distinguish chimeras from correct sequences, even when there are no sequence errors and the reference database is complete. This is a very surprising, almost shocking, result which is reported in the UCHIME2 paper. The reason is "fake models", where a correct sequence can be constructed as a chimera from two other correct sequences. Chimeras can have identical sequences to valid genes, so it is impossible for an algorithm to distinguish the two cases from a sequence alone. Fake models are common in practice, hence the problem.
So, what should you do? I would suggest running uchime2_ref with the largest possible database (which would be SILVA for 16S). Review the results with -mode balanced and -mode high_confidence. Usually, I find that -mode high_confidence is more useful because -mode balanced gives too many questionable predictions. With high_confidence, you will probably see a small number of quite convincing chimeric alignments. It is then a judgement call whether you think these are false positives due to fake models or true chimeras. It's anyone's guess, because it is impossible to distinguish these two cases. If you get a lot of convincing alignments and you think this may be due to problems with the built-in filter in cluster_otus or unoise3, then by all means send them to me for review.