Home Software Services About Contact     
 
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

24-Nov-2016
UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.

 

USEARCH v11

uparse_ref command

See also
  annot command

Annotation of amplicon sequences using the UPARSE-REF algorithm. This command is designed for use in validating mock community sequencing experiments where the sequences of the designed community are known. An important limitation of this approach is that sequences due to contaminants and cross-talk are not distinguished from reads with high error rates. The new annot command is now recommended as the first choice for mock community analysis, though uparse_ref can be useful as an independent check using a different underlying algorithm.

A database file of nucleotide sequences must be specified using the -db option. The database may be in FASTA or UDB format. The reference database should include all biological sequences that are expected to appear in the input set. The database should be complete and correct as far as possible, and should not be any larger than necessary. Do not use a large reference database such as Greengenes, SILVA or the gold reference database for UCHIME.

The uparse_ref command does not perform well on benchmarks developed to validate ChimeraSlayer and UCHIME. It is not designed for use as a general-purpose chimera detection or chimera filtering method.

The -strand option is required and must be specified as -strand plus. This means that the database must be oriented on the same strand as the query sequences (or contain both forward and reverse-complemented reference sequences).

The -uparseout option specifies a tabbed text output file documenting how the input sequences were classified..

The -fastaout option specifies a FASTA output file containing all input sequences with labels annotated according to their UPARSE-REF models. Generally, the -uparseout file is recommended because it is easier to understand and parse, but the -fastaout file provides more information

The -uparsealnout option species a text file containing a human-readable alignment of each query sequence to its UPARSE-REF model.

Parsimony score options are supported.

Alignment parameters and heuristics are supported.

Multithreading is supported.

Example

usearch -uparse_ref otus.fasta -db mock_ref.udb -strand plus -uparseout out.up