SINTAX

About SINTAX
The SINTAX algorithm predicts taxonomy by using k-mer similarity to identify the top hit in a reference database and provides bootstrap confidence for all ranks in the prediction. SINTAX achieves similar or better accuracy to the RDP Naive Bayesian Classifier with a simpler algorithm that does not require training. In particular, SINTAX has significantly lower false positive rates on full-length 16S and ITS sequences due to a lower over-classification rate.

SINTAX is supported by the sintax command in open-source USEARCH.

Paper
Edgar, Robert C. "SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences." biorxiv (2016): 074161.

Reference databases
FASTA files reformatted with SINTAX-compatible taxonomy annotations.

16S
rdp_16s_v16.fa.gz RDP training set v16 (13k seqs.). RDP license terms.
rdp_16s_v16_sp.fa.gz RDP training set with species names ( can species be predicted ?).
gg_16s_13.5.fa.gz Greengenes v13.5 (1.2M seqs.). Greengenes license terms.
silva_16s_v123.fa.gz SILVA v123 (1.6M seqs.). SILVA license terms.
ltp_16s_v123.fa.gz SILVA v123 LTP named isolate subset (12k seqs.). SILVA license terms

ITS
UNITE (current version at unite.ut.ee) (53k sequences in v7.1). UNITE license terms.
rdp_its_v2.fa.tz RDP Warcup training set v2 (18k sequences). RDP license terms.