Home Software Services About Contact     

SINTAX downloads

See also
Microbial taxonomy
  Which taxonomy database should I use?

FASTA files reformatted with SINTAX-compatible taxonomy annotations

   rdp_16s_v18.fa.gz RDP training set v18 (21k seqs.).
RDP license terms.
   rdp_16s_v16.fa.gz RDP training set v16 (13k seqs.).
RDP license terms.
   rdp_16s_v16_sp.fa.gz RDP training set with species names  (not recommended) (can species be predicted?).
   gg_16s_13.5.fa.gz Greengenes v13.5 (1.2M seqs.).
Greengenes license terms. (not recommended)
   silva_16s_v123.fa.gz SILVA v123 (1.6M seqs.).
SILVA license terms. (not recommended)
   ltp_16s_v123.fa.gz SILVA v123 LTP named isolate subset (12k seqs.) .
SILVA license terms
    UNITE (current "utax" version at unite.ut.ee) (53k sequences in v7.1).
UNITE license terms.
    rdp_its_v2.fa.gz RDP Warcup training set v2 (18k sequences).
RDP license terms.
     silva_18s_v123.fa.gz SILVA v123 eukaryotic 18S subset (140k seqs.) .SILVA license terms

References (please cite)
R.C. Edgar (2016), SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences, https://doi.org/10.1101/074161
  • SINTAX taxonomy prediction algorithm

  • Fast and simple method, accuracy comparable to RDP Classifier

R.C. Edgar (2018), Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences, PeerJ 6:e4652
  • Cross-validation by identity, novel benchmark strategy enabling realistic accuracy estimates

  • Genus accuracy of best methods is 50% on V4 sequences

  • Recent algorithms do not improve on RDP Classifier or SINTAX

R.C. Edgar (2018), Taxonomy annotation and guide tree errors in 16S rRNA databases, PeerJ 6:e5030
  • Approx. one in five SILVA and Greengenes taxonomy annotations are wrong

  • SILVA and Greengenes trees have pervasive conflicts with type strain taxonomies