UTAX reference data downloads
Taxonomy benchmark results
UTAX is an algorithm for taxonomy assignment which is implemented in the utax command. The cluster_otus_utax command generates OTUs based on taxa predicted by UTAX.
The main advantages of UTAX over previous classifiers such as the RDP Naive Bayesian Classifier (RDP) are very high speed, informative confidence values and flexible options for training on user-supplied data.
The algorithm is currently not published. See Validating Taxonomy Classifiers for the method I used to validate its accuracy compared with other algorithms.
At a high level, UTAX is a k-mer based method which looks
for words in common between the query sequence and reference sequences with
known taxonomy. A score calculated from word counts is used to estimate a
confidence value for each taxonomic level. Confidence values are
trained to give a
realistic estimate of error rates, in contrast to the bootstrap values
reported by RDP which are poor predictors of
error rates in practice.