Home Software Services About Contact usearch manual
OTU QC: short sequences

See also
  Quality control for OTU sequences

Sometimes, reads start at slightly (or very) different positions in the gene due to using a mix of primers which bind to different locations. For example, Kozich et al. 2013 used a mix with forward primers for V3 and V4 with reverse primers for V4 and V5, giving three type of amplicon: V3-V4, V4, and V3-V5. These must be separated before making OTUs. In other cases, there may be "staggered"  primers with small offsets (like GATTACA and ATTACAT) to increase image diversity (similar to using a PhiX spike-in) or to reduce the number of species with mismatches. With staggered primers, the reads must be trimmed to the same position before dereplication, so in my toy example the G should be deleted from reads starting with GATTACA (this is an example of what I call global trimming). Actually, the cluster_otus and unoise3 commands are designed to handle some types of staggered primer, so this may not be necessary. You can check for offset sequences using cluster_fast because it does not count terminal gaps as differences. For example,

usearch -cluster_fast otus.fa -id 0.97 -strand both -alnout otus.aln -show_termgaps \
  -userout user.txt -userfields query+target+qstrand+qlo+qhi+tlo+thi

Review the alignments for terminal gaps (note the -show_termgaps option) or look for qlo or tlo values in user.txt which are >1, e.g., this Linux command will show qlo values which are not 1:

cut -f4 user.txt | grep -v "^1$"