Home Software Services About Contact     

Quality control for OTU sequences

See also
OTU / denoising analysis
  Defining and interpreting OTUs
  Control samples

Amplicon reads often contain artifacts which are not filtered by my recommended pipeline because they vary widely in different datasets and it would be difficult to account for all of them in a single set of commands. It is generally easier to identify them by manually analyzing the OTU sequences rather than the reads because of the much smaller size of the dataset. Of course, if you are going to repeatedly run a pipeline with reads obtained from similar libaries, it would make sense to modify the pipeline to filter the types of artifact you find.

Here, I describe qualilty control checks that I use in my own work with links to discussion and commands. If you encounter other artifacts in your data, please let me know and I will update this page.

See control samples for discussion of how to use controls to better understand your data.

Issue   Description
Alignments   Do the OTU sequences align well to a reference database for your gene?
Missing OTUs   Do all OTUs appear in the OTU table?
Coverage   How much of the data is explained by the OTUs?
Short contructs   Bad sequencing construct created by PCR
Strand duplicates   Sequences of both plus and minus strands
Offsets   Sequences start at different positions in the gene
Cross-talk   Reads assigned to the wrong sample.
Sequence error   Polymerase errors and bad base calls
Low complexity   Sequencer noise
PhiX   Unfiltered spike-in
Chimeras   Unfiltered PCR chimeras
Mistargeting   Primers amplify a different region
Contaminants   Self-explanatory
Primers   Primer-binding sequences should be stripped at the start of the pipeline
Tight OTUs   OTUs >97% identical