OTU QC: primer-binding sequences

Usually, the primer-binding sequences in your gene (e.g., 16S) are included in the reads. These should be stripped out, because the PCR reaction tends to cause substitutions in those sequences (most often to remove any mismatches).

You can check whether they appear in the OTUs by using the search_oligodb command , e.g.:

usearch -search_oligodb otus.fa -db primers.fa -strand both \
-userout primer_hits.txt -userfields query+qlo+qhi+qstrand

If primers have not been stripped, you will see many hits to the forward primer at the start of the OTUs. If you have full-length paired reads without length trimming, you may also see many hits to the reverse primer at the end of the OTUs. If the primers were trimmed incorrectly, e.g. the primer is 20nt but you only stripped 16, this command won't catch the problem because all the letters in the database sequence must be included in the alignment (substitutions are allowed but gaps are not allowed, even terminal gaps).

Primer stripping should be done before quality filtering (because every base increase expected errors ) and before finding unique sequences (because variation in the primer-binding region will split over biological sequence over several uniques, degrading the calculation of unique sequence abundance ), so if primers do appear in the OTUs then you should go back and fix the pipeline.