Annotation of amplicon sequences using the UPARSE-REF
algorithm. This command is designed for use in validating mock community sequencing
file of nucleotide sequences must be specified using the ‑db
option. The database may be in FASTA or UDB format. The reference database should include
all biological sequences
that are expected to appear in the input set. The database should be complete
and correct as far as possible, and should not be any larger than necessary.
Do not use a large reference database
such as Greengenes, SILVA or
the gold reference database for UCHIME.
The main use for uparse_ref
is to annotate reads, OTUs and other sequences generated from mock community experiments
where the biological sequences in the sample are known.
The uparse_ref command does
not perform well on benchmarks developed to validate ChimeraSlayer and UCHIME. It is
not designed for use as a
general-purpose chimera detection or chimera filtering method.
option is required and must be specified as -strand plus. This means that the
database must be oriented on the same strand as the query sequences (or contain
both forward and reverse-complemented reference sequences).
The -uparseout option
specifies a tabbed text output file documenting how the input sequences were
The -fastaout option specifies a FASTA output file
containing all input sequences with labels
annotated according to their UPARSE-REF
models. Generally, the -uparseout file is recommended because it is easier to
understand and parse, but the -fastaout file provides more information
The -uparsealnout option species a text file
containing a human-readable alignment of each query sequence to its UPARSE-REF
Parsimony score options
Alignment parameters and
heuristics are supported.
Multithreading is supported.
usearch -uparse_ref otus.fasta -db
mock_ref.udb -strand plus -uparseout out.up