Taxonomy confidence measures
Taxonomy benchmark home
The SINTAX algorithm generates
taxonomy predictions with confidence estimates specified as a bootstrap
The definition and interpretation of a taxonomy
prediction confidence estimate is not as simple as it might appear. Ideally, the
error rate of predictions with confidence 0.9 should be approximately 10%, but
in practice the error rate depends on the query dataset and on unknown
characteristics of the reference dataset. It would be nice to calculate a p-value,
but this is tricky because we need two statistical models: one for the hypothesis
we are testing plus a null model in which the hypothesis is false and the
observation occurs by chance. The (now deprecated) UTAX algorithm implemented a
method for calculating p-values, but it only works well for 16S genes
and is not decisively better than SINTAX
bootstrapping in practice..
Most taxonomy prediction algorithms don't provide a
confidence estimate, including
GAST, the default QIIME method (assign_taxonomy.py
‑m uclust) and the mothur
Classify_seqs command with method=knn. A notable exception is the
RDP Naive Bayesian
Classifier (RDP) which reports a confidence value obtained by
This was an important improvement over previous methods and is a good reason why
RDP is currently the most widely-used algorithm for 16S taxonomy prediction.
However, everyone agrees that the RDP bootstrap value should not be interpreted
as indicating the probabllity that the prediction is correct (which would be
100% minus the estimated error probability). The authors claim that for 16S
sequences shorter than 250nt, a bootstrap threshold of 50% gives accurate
results to genus level, claiming accuracies from 79% to 100% depending on the V
region (see discussion and table under "Confidence threshold" at
https://rdp.cme.msu.edu/classifier/class_help.jsp). If this result is
valid, the error rate at 50% bootstrap is presumably much less than 50%.
However, I believe their "leave-one-out" validation seriously under-estimates error rates on real data (for discussion see
validating taxonomy classifiers). On my tests, I find a 33% error rate for
genus predictions by RDP on the V3-V5 segment (~530nt) at 50% bootstrap cutoff.
At 100% bootstrap confidence, the error rate is 8%.
The SINTAX boostrap
value has similar accuracy to RDP on V4 sequences. On ITS sequences and
full-length 16S sequences, the SINTAX boostrap value is significantly better
(see SINTAX paper).