Home Software Services About Contact usearch manual
Read preparation: length trimming

See also
  OTU / denoising pipeline
  Read preparation
  Trimming fungal ITS reads

To get good OTU sequences, reads must be trimmed such that all sequences derived from the same biological template start at the same position in the gene and have the same length. I call this "global trimming". This is required to get a good measurement of unique sequence abundances. Good abundances are needed by the UPARSE algorithm (cluster_otus command) and the UNOISE algorithm (unoise3 command) because they assume that high-abundance sequences are much more likely to be correct biological sequences.

It is ok for reads of different biological sequences to have different lengths because of natural variation in the length of the gene or region. See trimming for fungal ITS.

With overlapping paired reads, length trimming as such is usually not necessary because the reverse reads start at a primer-binding locus, and the merged sequence therefore always ends at that locus. However, you should still trim the primer sequences.
 
Length trimming may be needed if you have unpaired reads which vary in length in the raw data files and / or have lower quality towards the 3' ends, which is often the case with unpaired reads such as 454 or Ion Torrent. You can choose an appropriate trim length using the fastq_eestats2 command.

You can trim to a fixed length by using the fastx_truncate command. For example, somewhere around 250 is often a good chocie for 454 reads, which can be implemented like this:

usearch -fastx_truncate reads.fq -trunclength 250 -fastqout reads250.fq