Home Software Services About Contact usearch manual
UTAX algorithm

See also
 
UTAX reference data downloads
 
How do I create my own taxconfs file?
  Validating taxonomy classifier algorithms
  UTAX and the RDP classifier compared on fungal ITS

UTAX is an algorithm for taxonomy assignment which is implemented in the  utax command.

The main advantages of UTAX are very high speed and predictive P-values.

The algorithm is currently not published. See Validating Taxonomy Classifiers for the method I used to validate its accuracy compared with other algorithms, including especially RDPC. See results on ITS.

At a high level, UTAX is a word-counting method  similar to the RDP Naive Bayesian Classifier. It exploits the "U-sorting" strategy of the USEARCH algorithm to perform an alignment-free search of the reference database (because my testing found no significant improvement using alignments). The fractional word counts are used to calculate a score and P-value for each taxonomic level. The P-values are obtained by curve-fitting to empirical results on training data and give a realistic estimate of the error rates at all taxonomic levels, in contrast to the bootstrap values reported by the RDP Classifier which do not predict true error rates.