Home Software Services About Contact     
 
USEARCH v11

Read preparation: quality filtering

See also
  OTU / denoising pipeline
  Read preparation
  Defining and interpreting OTUs
  Expected errors
 
To get good OTU sequences, low-quality reads should be discarded because they often cause spurious OTUs. I strongly recommend using expected error filtering using the fastq_filter command, which is much more effective than most other quality filters.

 Discarding singletons is also an effective strategy for quality filtering.

Quality filtering should be performed after paired read merging, stripping primers and length trimming.

Paired read merging should be done before quality filtering because the posterior Q scores in the overlapping region are more accurate. You should use the usearch fastq_mergepairs command to get this benefit, because most other paired read assemblers generate incorrect concensus Q scores, most notably PANDAseq which systematically reduces Q scores at positions where both reads agree.

Trimming should be done before quality filtering because trimming always reduces expected errors, so e.e. will be over-estimated if it is calculated before trimming.

I recommend using setting the maximum expected error threshold to 1.0, regardless of the read length.

Example

usearch -fastq_filter trimmed.fq -fastq_maxee 1.0 -fastaout filtered.fa

Validating quality filtering
The best way to validate the effectiveness of quality filtering, and the other steps in your pipeline, is to use control samples with known composition. If you don't have control samples, the fastx_learn command can be used to estimate the error rate de novo. This can be used as a check that the error rate after quality filtering is low, i.e. that the Q scores give good predictions of base call errors. For some machines, e.g. 454 and Ion Torrent, the Q scores are much less effective than Illumina because they are estimates of the homopolymer length error, not of the base call error. Using fastx_learn can reveal this type of problem. If that happens, a more effective quality filtering strategy is to increase the minimum abundance threshold. The minimum abundance can be tuned using a mock community control sample.