Operational Taxonomic Units (OTUs)
In traditional numerical taxonomy (Sokal and Sneath, 1963; Sneath and Sokal,
1973), an Operational Taxonomic Unit (OTU) is a term that means "the thing(s)
being studied". The definition is intentionally vague. The "thing(s)" could be
an individual organism, a named taxonomic group such as a species or genus, or a
group with undetermined evolutionary relationships that share a given set of
observed characters. It is up to a scientist to specify and justify his or her
definition of OTUs in the context of a particular study.
Can traditional numerical taxonomy methods be used
for 16S reads?
Methods from numerical taxonomy are often applied to next-generation marker
gene sequencing studies, where organisms are not directly observed. An OTU is
typically defined as a cluster of reads with 97% similarity, motivated by the
expectation that these correspond approximately to
species. This is reasonable providing that downstream analysis takes into
account that the correspondence of OTUs with
species may fail because:
(i) some species have genes that are >97% similar,
giving merged OTUs containing multiple species,
(ii) a single species may have paralogs that are
<97% similar, causing the species to be split across two or more OTUs, and
(iii) some clusters, even a majority, may be
spurious due to artifacts including read errors and chimeras.
Traditional methods, including rarefaction curves to
assess species richness and alpha and beta diversity estimators, implicitly
assume that OTUs are observations of organisms with negligible error, and that
the number of observations (reads) correlates well with the total number of
individuals present in the community. I believe
that these methods must be modified in cases where OTUs do not reliably
correspond to species or monophyletic groups, especially if OTUs with lower
abundance are more likely to be artifacts. Similar considerations apply to
inferences based on the RDP Classifier, which may report a chimera as a novel
genus, or methods that require building a phylogenetic tree, e.g. for UniFrac,
where the tree topology will be disrupted by chimeras. If a majority of OTUs are
experimental artifacts, then traditional species richness estimates are not
valid, and measures of between-sample variation will tend to reflect differences
in artifact frequencies rather than biological differences.
Sokal, PHA and Sneath, RR
(1963), Principles of Numerical Taxonomy, San Francisco: W.H. Freeman..
Sneath, RR and Sokal, PHA (1973), Numerical Taxonomy, San Francisco: W.H.