FAQ: Does UNOISE work on 454, Ion Torrent, PacBio... reads
Home Software Services About Contact     
 
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

24-Nov-2016
UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.

 

USEARCH v11

FAQ: Does UNOISE work on 454, Ion Torrent, PacBio... reads?

UNOISE was designed and tested on Illumina reads. I've done some quick testing on 454 and I-T reads and the results are not as good as Illumina, but might be good enough to be useful.

The most obvious problems are due to indel errors which arise from incorrect homopolymer lengths. These cause two issues: incorrect alignments and inflated abundance of incorrect reads. Incorrect alignments occur when an insertion and deletion occur close together, e.g.:

ABCDEFG correct sequence
ABDEEFG bad read

The correct alignment is

ABCDE-FG
AB-DEEFG

However, the shorter alignment above gets a higher score because of gap penalties. If you use lower gap penalties, you solve some of these but get other problems.

The abundance problem arises because different errors give the same sequence. E.g., if you delete one B in ABBC you get ABC, and there are two different ways to do this. The abundance of ABC is then double what it "should" be considering the frequency of deletion errors. In general, for a homopolymer of length N the abundance is N x higher than it should be. Same applies to inserting one extra B.

As always, the best way to check is to use control samples, especially mock communities. I strongly recommend including mock samples in all sequencing runs because this is by far the best way to validate the entire pipeline from sequencing to OTU / denoised sequences. Without mock samples, it is difficult / impossible to measure cross-talk, reagent contaminants, accuracy of Phred scores, chimera formation and the final rate of spurious OTUs.