| See also
 OTU clustering
 SSU metagenomics
 What is an OTU?
 Finding species in SSU reads
 Species abundance estimates
 
Constructing OTUs by database matchingIdeally, given a set of 16S reads, we would like to identify all known and 
novel species. In practice, this is very challenging owning to
several complications. Typically, many reads 
do not match a reference database well enough to allow a species assignment.
 Identity threshold for species assignmentTraditionally, a 97% match has been considered sufficient for species 
assignment in 16S sequences, though it should be noted that this is only 
approximate: sometimes two different species have identical 16S sequences, and 
conversely a single species may have two copies of the 16S gene that differ by 
more than 97%. With shorter reads, the 97% cutoff approximation becomes 
worse.
 Constructing OTUs by de novo clusteringUsually, the best we can do with unmatched reads is to cluster them into groups that 
are 97% similar. For consistency, database matching is often done after 
clustering so that some OTUs are assigned to species and others are flagged as 
novel or unknown. Some of these clusters may contain reads of PCR artifacts such 
as undetected chimeras, and others may be 
due to gene duplications in known or novel species.
 Do not expect a one-to-one correspondence between 
OTUs and speciesDue to the complications discussed above, we cannot expect a 1:1 
correspondence between OTUs and species. At best, we can aim for a 1:1 
correspondence between OTUs and unique copies of the 16S gene, though this ideal 
is undermined by experimental error that is hard or impossible to eliminate, 
including sequencing errors and PCR artifacts.
 
 |