Commands > OTU analysis, Clustering
Closed-reference OTU algorithm
Download QIIME-compatible Greegenes 97% OTU
Problems with closed-
and open-reference OTU assignment
The closed_ref command performs closed-reference OTU
assignment using a similar strategy to
pick_closed_reference_otus.py in QIIME.
You can download the
default database here.
I am providing this command
because a few users have asked for it, but
I do not recommend closed- or open- reference
OTU assignment because my tests show that
closed- and open-reference methods have fundamental flaws.
The main use
of the closed_ref command is to generate OTUs that are compatible with analyses
that require closed-reference, in particular
PICRUSt. Given the problems
with closed-reference, and the difficulty in predicting traits by comparing
short 16S reads to sparse reference databases (see discussion in the
SINAPS algorithm page), I am skeptical
that reliable predictions are possible with the PICRUSt approach.
The closed_ref command can be used as an alternative
to the QIIME pick_closed_reference_otus.py script, but the results
will not be exactly the same. The
database search method used in QIIME (at least as of v1.9) is the
old uclust program (the predecessor of usearch) with default parameters
which were designed primarily to maximize speed, while the
closed_ref command in usearch uses an improved implementation of the
USEARCH algorithm with settings designed to increase sensitivity and
report ties where two or more reference sequences have the same identity.
The -strand option is required; it can be set to -strand plus (search only
on the plus strand) or -strand both (search both strands). You can use
‑strand plus if you know the reads are on the same strand as the database
(makes the search a bit faster).
Minimum sequence identity is specified by the -id option, default is 0.97.
Value is between 0.0 and 1.0, so 0.97 corresponds to 97% identity.
Standard OTU table output files are
The -tabbedout file reports one line per query sequence.
Fields are query label, OTU label, identity and a list of ties if more than
one OTU has the same identity.
usearch -closed_ref reads.fastq -db gg97.fa -otutabout otutab.txt -strand
plus -tabbedout closed.txt