FAQ: Does UNOISE work on 454, Ion Torrent, PacBio...
UNOISE was designed and tested on Illumina reads. I've
done some quick testing on 454 and I-T reads and the results are not as good as
Illumina, but might be good enough to be useful.
The most obvious problems are due to indel errors
which arise from incorrect homopolymer lengths. These cause two issues:
incorrect alignments and inflated abundance of incorrect reads. Incorrect
alignments occur when an insertion and deletion occur close together, e.g.:
ABCDEFG correct sequence
ABDEEFG bad read
However, the shorter
alignment above gets a higher score because of gap penalties. If you use
lower gap penalties, you solve some of these but get other problems.
The abundance problem arises because different errors
give the same sequence. E.g., if you delete one B in ABBC you get ABC, and
there are two different ways to do this. The abundance of ABC is then double
what it "should" be considering the frequency of deletion errors. In
general, for a homopolymer of length N the abundance is N x higher than it
should be. Same applies to inserting one extra B.
As always, the best
way to check is to use control samples, especially mock communities. I
strongly recommend including mock samples in all sequencing runs because
this is by far the best way to validate the entire pipeline from sequencing
to OTU / denoised sequences. Without mock samples, it is difficult /
impossible to measure cross-talk, reagent
contaminants, accuracy of Phred scores, chimera formation and the final rate
of spurious OTUs.