Should I use UTAX or SINTAX? Which database?
The SINTAX algorithm predicts the taxonomy of marker gene
reads such as 16S or ITS. It is implemented in the
sintax command. See SINTAX paper for
Bootstrap confidence values are provided for all
The algorithm is similar to the RDP Naive Bayesian
Classifier except that k-mer similarity is used to identify the top
taxonomy rather than Bayesian posteriors so there is no need for training.
Also, SINTAX does not require that the lowest ("training") rank be specified
for all reference sequences which allows the use of large databases such as
SILVA or Greengenes as a reference.
On short tags such as V4, SINTAX has similar accuracy
to RDP. On full-length 16S and ITS sequences SINTAX has a lower rate of
over-classification errors and will thus have a
lower overall error rate on typical data.