Home Software Services About Contact usearch manual
UPARSE OTU radius

See also
 
Defining and interpreting OTUs

UPARSE OTUs with radius different from 3% (i.e., different from 97% identity)
In previous versions of USEARCH the cluster_otus command command had an otu_radius_pct option for specifying a radius different from the default of 3%. However, please note that it is not recommended to use non-default values.

The main reason is that chimera detection degrades. Each input sequence is run through UPARSE-REF using the current set of OTUs as a reference database. If the optimal model is chimeric, the sequence is discarded. If an OTU radius > 3% is used, then chimera detection becomes more difficult because more true biological sequences will also be discarded when they don't create new OTUs. The set of OTU sequences becomes sparser, and the correct parents of a chimera will more often be missing from the OTU database. Chimeras can still be detected when there are OTUs which are sufficiently close to their parents, but the false negative rate will tend to increase.

Chimera detection also gets more difficult when the OTU radius is <3%. This is because you get many more false positives due to "fake models" where a correct biological sequence can be exactly reconstructed from segments of two other valid sequences. This surprising result is explained in detail in the UCHIME2 paper.

Recommended: make OTUs with 100% clustering identity
My current recommendation is to use the UNOISE error-correction (denoising) algorithm to reconstruct the set of correct biological sequences in the reads. These sequences are valid OTUs which I call "ZOTUs" (zero-radius OTUs). This is better than traditional 97% clustering because it has better phenotype resolution as it allows you to distinguish species and strains which would be lumped together at 97%. See unoise2 command for details.

Recommended procedure for OTUs with clustering identity <100%
To make OTUs at identities different from 97%, the best method is to use UNOISE followed by UCLUST, e.g. the unoise3 command followed by  cluster_smallmem. For example, to make OTUs at 100%, 99%, 97%, 95% and 90% identity:

usearch -unoise3 uniques.fa -fastaout otus100.fa -minampsize 4

for id in 99 97 95 90
do
   usearch -cluster_smallmem otus100.fa -id 0.$id -centroids otus$id.fa
done