Home About Contact     
 
About SINTAX
The SINTAX algorithm predicts taxonomy by using k-mer similarity to identify the top hit in a reference database and provides bootstrap confidence for all ranks in the prediction. SINTAX achieves similar or better accuracy to the RDP Naive Bayesian Classifier with a simpler algorithm that does not require training. In particular, SINTAX has significantly lower false positive rates on full-length 16S and ITS sequences due to a lower over-classification rate.

SINTAX is supported by the sintax command in open-source USEARCH.

Paper
Edgar, Robert C. "SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences." biorxiv (2016): 074161.
 
Reference databases
FASTA files reformatted with SINTAX-compatible taxonomy annotations.

16S
rdp_16s_v16.fa.gz RDP training set v16 (13k seqs.). RDP license terms
.
rdp_16s_v16_sp.fa.gz RDP training set with species names ( can species be predicted ?).
gg_16s_13.5.fa.gz Greengenes v13.5 (1.2M seqs.). Greengenes license terms
.
silva_16s_v123.fa.gz SILVA v123 (1.6M seqs.). SILVA license terms
.
ltp_16s_v123.fa.gz SILVA v123 LTP named isolate subset (12k seqs.). SILVA license terms


ITS
UNITE (current version at unite.ut.ee) (53k sequences in v7.1). UNITE license terms
.
rdp_its_v2.fa.tz RDP Warcup training set v2 (18k sequences). RDP license terms
.
 
1sco
Search the AlphaFold DB online in seconds >