Sequence databases with taxonomy classifications
Taxonomy annotation errors in large databases
Taxonomy database downloads
Use a small database with authoritative classifications
I recommend using a authoritatively classified sequences, e.g. for 16S the most recent RDP training set or LTP release.
Taxonomy annotations in large databases are unreliable predictions
The taxonomy annotations in the large 16S databases (SILVA, Greengenes, or the full RDP database) are mostly computational predictions from 16S sequences. Roughly one in five of these predictions are wrong, probably because the guide trees have pervasive branching order errors. Therefore, using annotations from large databases adds a substantial error rate in the reference dataset on top of the intrinsic error rate of a prediction algorithm such as SINTAX or the Naive Bayesian Classifier. With these considerations in mind, I believe it is best to use a database of type strain and isolate sequences rather than Greengenes, SILVA or RDP.