Home Software Services About Contact     
 
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

24-Nov-2016
UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.

 

USEARCH v11

UPARSE-REF algorithm

 
See also
 
uparse_ref command
  cluster_otus command
 
OTU clustering

Given a reference database D of sequences in a sample that is assumed to be complete and correct, UPARSE-REF infers errors in a sequence using parsimony. The goal of UPARSE-REF is to explain a given sequence S with the fewest possible events starting from sequences in D. Here, "events" are mutations that arise from PCR or sequencing errors. This is done by constructing a model sequence M using one or more sequences from the database (refseqs). Typically, M is a single refseq representing a non-chimeric amplicon. Otherwise, M is made from m refseq segments that are concatenated to represent a chimeric amplicon. If M has one segment, i.e. is a single refseq, then the distance between M and S is defined to be the number of mismatches, which are interpreted as sequencer or PCR errors.

The figure below shows an example where the read has a chimeric model. Here, the penalty for a chimeric crossover is +3 and the penalty for a mismatch is +1. The total score for the model is 4 (+1 for one mismatch +3 for one chimeric crossover).

Image

UPARSE-REF is used internally as a step in the UPARSE-OTU algorithm for OTU construction (cluster_otus command). The main use for UPARSE-REF as a standalone command (uparse_ref) is annotation of reads, OTUs and other sequences in mock community experiments where the set of biological sequences in the sample is known.