OTU QC: sequences on both strands

If reads are created from both strands of the gene, then you will tend to get duplicated OTUs where one is the reverse-complement of the other.

To check for reads or OTU sequences on both strands, use the orient command with -tabbedout orient.txt. Any reference database will do for a quick check, though a large reference database is recommended for orienting the reads in a production pipeline. To get the number of sequences on each strand, use the following Linux command:

cut -f2 orient.txt | sort | uniq -c

All your OTUs should be on the same strand. If not, you need to adjust the pipeline to perform orientation before dereplication (the fastx_uniques step). You could also use:

usearch -cluster_fast otus.fa -id 0.97 -strand both \
-userout user.txt -userfields query+target+qstrand