FAQ: Should you use UPARSE or UNOISE?
There are two different ways to make OTUs: 97% clustering and denoising.
The UPARSE algorithm makes 97% OTUs.
UNOISE algorithm does denoising, i.e.
If UPARSE works perfectly, it will give you a subset of
the correct biological sequences in your reads such that no two sequences are
>97% identical to each other. It is implemented in the
If UNOISE works
perfectly, it will give you all the correct biological sequences in the reads.
It is implemented in the unoise3 command.
(Of course, no algorithm is perfect so you should expect some mistakes).
The UPARSE pipeline and
UNOISE pipeline are very similar, the main
difference is whether you run cluster_otus or unoise3 as the clustering step.
Once you have made an OTU table, you can
proceed with diversity analysis etc. in the same
way, regardless of whether you used UPARSE or UNOISE.
Which should you
choose? I suggest you try both.
Pros and cons
all published papers use 97% clustering, so this will be easier to explain to
your PI and to referees. The main disadvantage of 97% clustering is that you
discard some correct biological sequences that are present in your reads. If
these represent strains or species with a different phenotype, then you lose
relevant information and the corresponding reads will be lumped together into
one OTU that contains multiple phenotypes.
The main advantage of denoising is that you get
better resolution by keeping all the biological sequences. The main disadvantage
of denoising is that species often have variations between individuals and paralogs that are not 100% identical. If
you have intra-species variations in the region that you sequenced, then you will get two or more
OTUs for one species. For most purposes, this really doesn't matter -- it might
even be better if this enables you to detect strains with different phenotypes
-- so if I
have to recommend one method, then I would recommend denoising.