Home Software Services About Contact usearch manual
UTAX algorithm

See also
  Should I use UTAX or SINTAX? Which database?
  UTAX reference data downloads
  utax command
  cluster_otus_utax command
  makeudb_utax command
  Taxonomy predictions
  Taxonomy confidence
  Taxonomy training

  Taxonomy benchmark results

UTAX is an algorithm for taxonomy assignment which is implemented in the utax command. The cluster_otus_utax command generates OTUs based on taxa predicted by UTAX.

The main advantages of UTAX over previous classifiers such as the RDP Naive Bayesian Classifier (RDP) are very high speed, informative confidence values and flexible options for training on user-supplied data.

The algorithm is currently not published. See Validating Taxonomy Classifiers for the method I used to validate its accuracy compared with other algorithms.

At a high level, UTAX is a k-mer based method which looks for words in common between the query sequence and reference sequences with known taxonomy. A score calculated from word counts is used to estimate a confidence value for each taxonomic level. Confidence values are trained to give a realistic estimate of error rates, in contrast to the bootstrap values reported by RDP which are poor predictors of error rates in practice.