Use a small database with authoritative classifications
	The database is the easier decision: I believe it's clear you should use a 
	small set of authoritatively classified sequences, e.g. for 16S that could 
	be the RDP training set or the SILVA LTP subset. 
	With the big databases (SILVA, Greengenes, RDP), you 
	have the problem of unknown rates of annotation error and ambiguous blank 
	names. Is a blank name a high-confidence prediction that the group is not 
	named, or is it probably a known name with a confidence that is too low to 
	annotate but high enough you want to know about it, say P=0.8? It could be 
	either.
The full RDP database is definitely not a good choice because 
	the taxonomies were predicted by the RDP Classifier, which has a high rate 
	of over-classification errors on full-length sequences (see
	SINTAX paper). If you want predictions using the 
	Bergey's nomenclature, then I would recommend using the RDP training set 
	with SINTAX or UTAX.
Bottom line, when you add a second layer of 
	taxonomy prediction (your sequences) on top of ambiguous / error-prone 
	predictions in a big database, the results are hard / impossible to 
	interpret in a meaningful way.
UTAX and SINTAX have different 
	strengths and weaknesses
SINTAX is brand new so I don't have 
	much experience with it yet (this was written just after version 9 was 
	released). On short 16S tags like V4, SINTAX and RDP have very similar 
	performance. On longer 16S sequences and on ITS sequences, SINTAX is better 
	than RDP. SINTAX is simpler because it doesn't need training, while training 
	UTAX or RDP is quite challenging if you want to use your own database. UTAX 
	is the only algorithm which tries to account for sparse reference data and 
	has the lowest over-classification rate of any algorithm (except possibly 
	the k-nearest-neighbor method in mothur, but knn has low sensitivity in 
	general). However, UTAX sometimes has lower sensitivity than SINTAX to known 
	taxa. Neither algorithm is a clear winner over the other.