Commands > OTU analysis, Chimeras
Chimera detection using the
UCHIME2 algorithm. A user-supplied reference database is searched for parent
sequences. See UCHIME2 paper foe details.
file of nucleotide sequences must be specified using the ‑db
option. The database may be in FASTA or UDB format. UDB format is faster to
load. The reference database should include sequences that might appear as
parents in the query set.
use UCHIME(2) for OTU clustering or denoising!
I do not recommend using uchime2_ref
or uchime2_denovo in an OTU clustering pipeline because of the risk of false positives. The
unoise commands have built-in de
filtering which works very well for most data.
It is usually strongly recommended to use
the largest possible database, e.g. SILVA for 16S or
UNITE for ITS. The advice to use a small, high-quality database in the first
UCHIME paper and in previous versions of the USEARCH manual was wrong!
The following output files are supported:
out.txt (tabbed text)
-chimeras ch.fa (FASTA file with predicted
-notmatched not.fa (FASTA file with sequences not
matched to the database)
-uchimealnout aln.txt (alignments)
The -nonchimeras option is no longer supported. This is because it is not
possible to determine that a sequence is non-chimeric, the best we can say
is that it is found / not found in the database (the reasons are explained
in the UCHIME2 paper). The -notmached output is
the equivalent for uchime2_ref, but as with uchime_ref you should
not interpret the output as containing non-chimeric sequences!
The -mode option is required, must be one of:
Report chimera predictions
which with confidence, at the expense of a high false negative rate.
Report chimera predictions which
with confidence, at the expense of a high false-negative rate. Similar to
high_confidence mode, but less stringent so the false negative rate is lower
but the false positive rate may be higher. Gives results similar to the old
Attempts to balance false
negatives and false positives to minimize the overall error rate on typical
data. Of course, the rates are highly data-dependent.
Emphasizes high sensitivity at the
expense of a high false positive rate.
Reports all perfect chimeric models.
Mostly used for designing and validating algorithms -- this mode is rarely,
if ever, useful in practice because the database is implicitly assumed to be
complete (i.e., all parent sequences are exactly present) and the query set
and database are both assumed to have no errors. A single difference will
prevent the model from being reported, causing false negatives. Conversely,
fake models are common, causing false positives (see
UCHIME2 paper for details).
option is required. Currently this must be specified as -strand plus because
searching on both strands is not supported.
Multithreading is supported.
‑self option specifies that
a reference sequence matching the query
sequence should be ignored. This is useful for estimating the false-positive
rate using a database of sequences known to be free of chimeras. Then, -self
does a leave-one-out test. The -self option requires that the query and database
are the same file.
R. C. Edgar (2016), UCHIME2: Improved chimera detection for amplicon
usearch -uchime_ref reads.fasta -db
16s_ref.udb -uchimeout out.txt -strand plus -mode sensitive