Home Software Services About Contact usearch manual
FAQ: Should I increase the expected error threshold for long reads?

A maximum expected error threshold of 1 means that the most probable number of errors is zero, regardless of the read length. I would recommend using this threshold unless you have a good reason to change it. A common objection is that too many reads are discarded, but assuming you are doing OTU analysis, you should find that most of the discarded reads are recovered when you map the unfiltered reads to your OTUs.

If this doesn't happen, you may need to consider other strategies such as truncating the reads to reduce the error rate.

Another question to consider is whether you follow my recommendation to discard singletons before OTU clustering. If you do discard singletons, this should take care of a large majority of the "harmful" reads in the tail of the distribution, i.e. those with >3% errors. In that case, you could try using higher expected error thresholds. Suppose you get more OTUs. This could be a good thing (higher sensitivity) or a bad thing (most of the new OTUs are spurious). How could you distinguish these two situations? If you have a lot of spurious OTUs, how would this impact the biological questions you are trying to answer?

The best way to check error rates is to use a control sample such as a mock community, but most people don't sequence a control sample so this check may not be available. In that case, I prefer to be conservative because most analysis pipelines produce large numbers of spurious OTUs.