FAQ: Does UNOISE work on 454, Ion Torrent, PacBio... reads

UNOISE was designed and tested on Illumina reads. I've done some quick testing on 454 and I-T reads and the results are not as good as Illumina, but might be good enough to be useful.

The most obvious problems are due to indel errors which arise from incorrect homopolymer lengths. These cause two issues: incorrect alignments and inflated abundance of incorrect reads. Incorrect alignments occur when an insertion and deletion occur close together, e.g.:

ABCDEFG correct sequence
ABDEEFG bad read

The correct alignment is


However, the shorter alignment above gets a higher score because of gap penalties. If you use lower gap penalties, you solve some of these but get other problems.

The abundance problem arises because different errors give the same sequence. E.g., if you delete one B in ABBC you get ABC, and there are two different ways to do this. The abundance of ABC is then double what it "should" be considering the frequency of deletion errors. In general, for a homopolymer of length N the abundance is N x higher than it should be. Same applies to inserting one extra B.

As always, the best way to check is to use control samples, especially mock communities. I strongly recommend including mock samples in all sequencing runs because this is by far the best way to validate the entire pipeline from sequencing to OTU / denoised sequences. Without mock samples, it is difficult / impossible to measure cross-talk, reagent contaminants, accuracy of Phred scores, chimera formation and the final rate of spurious OTUs.