Home Software Services About Contact usearch manual
FAQ: Should I retrain UTAX for my region / read length?

See also
 
UTAX algorithm
 
utax command
  How to train UTAX on your own reference data

If you have 16S or ITS reads with a different length or region than the pre-trained taxconfs files, then retraining may not be worth the effort -- it's often perfectly reasonable to chose the region and length that is closest to your reads.

Re-training parameters will not change the predicted taxon names -- UTAX will always predict that the taxonomy is the same as the top hit in the reference database (sorted by U = unique k-mer identity). Only the estimated confidence values can change. The main effect of retraining will be to adjust all confidence levels up or down by roughly the same amount at each rank. This means that the effect of re-training will be similar to adjusting the P-value cutoff up or down.

Does it matter for your analysis if the confidence values change by a small amount? Note that the interpretation of a confidence value is not very clear; for sure, if you set a cutoff P>0.9 that does not mean you should expect a 10% error rate (see taxonomy confidence for discussion). It could be much higher or lower, depending on your data.

One way to get a sense of how big the change might be is to try running UTAX on your data with two different sets of parameters. E.g., suppose you have reads of length 350. Try running utax with parameters for lengths 250 and 500. How big is the change? Is the change very different from what you would get by changing the threshold? Check a few examples by hand. Once you've done this, you might feel confortable using one of the existing taxconfs file, perhaps with a different cutoff.

The biggest problems in taxonomy prediction are (1) reference databases are missing most species, and (2) taxonomies were mostly assigned before molecular sequence data was available, so there are conflicts and inconsistencies between sequence and taxonomy. Predictions will therefore be unreliable and noisy whatever you do.