Sometimes, reads start at different positions in the gene due to using a mix of primers which bind to different locations. For example, Kozich et al. 2013 used a mix with forward primers for V3 and V4 with reverse primers for V4 and V5, giving three type of amplicon: V3-V4, V4, and V3-V5. These must be separated before making OTUs. In other cases, there may be "staggered" primers with small offsets (like GATTACA and ATTACAT, which are offset by one base) to increase image diversity (similar to using a PhiX spike-in) or to reduce the number of species with mismatches. With staggered primers, the reads must be trimmed to the same position before dereplication, so in my toy example the G should be deleted from reads starting with GATTACA (this is an example of what I call global trimming). Actually, the cluster_otus and unoise3 commands are designed to handle some types of staggered primer, so this may not be necessary. You can check for offset sequences using cluster_fast because it does not count terminal gaps as differences. For example,
usearch -cluster_fast otus.fa -id 0.97 -strand both -alnout otus.aln -show_termgaps \
-userout user.txt -userfields query+target+qstrand+qlo+qhi+tlo+thi
Review the alignments for terminal gaps (note the -show_termgaps option) or look for qlo or tlo values in user.txt which are >1, e.g., this Linux command will show qlo values which are not 1:
cut -f4 user.txt | grep -v "^1$"
Another way to check is to compare the OTU sequences to the unique sequences, as follows.
usearch -usearch_global uniques.fasta -db zotus.fa -strand both \
-id 1.0 -maxaccepts 4 -maxrejects 64 -userout uniques_vs_zotus.txt \
cut -f1 uniques_vs_zotus.txt | sort | uniq -d
cut -f2 uniques_vs_zotus.txt | sort | uniq -d
If more than one unique sequence matches a given OTU with 100% identity, or vice versa, then you must either have offset sequences in your reads or strand duplicates in your OTUs.
See shifted sequences warning for further discussion.