Home About Contact     
 

Quality control for OTU sequences

See also
OTU / denoising analysis
Defining and interpreting OTUs
Control samples

Amplicon reads often contain artifacts which are not filtered by my recommended pipeline because they vary widely in different datasets and it would be difficult to account for all of them in a single set of commands. It is generally easier to identify them by manually analyzing the OTU sequences rather than the reads because of the much smaller size of the dataset. Of course, if you are going to repeatedly run a pipeline with reads obtained from similar libaries, it would make sense to modify the pipeline to filter the types of artifact you find.

Here, I describe qualilty control checks that I use in my own work with links to discussion and commands. If you encounter other artifacts in your data, please let me know and I will update this page.

See control samples for discussion of how to use controls to better understand your data.

Issue Description
Alignments Do the OTU sequences align well to a reference database for your gene?
Missing OTUs Do all OTUs appear in the OTU table?
Coverage How much of the data is explained by the OTUs?
Short contructs Bad sequencing construct created by PCR
Strand duplicates Sequences of both plus and minus strands
Offsets Sequences start at different positions in the gene
Cross-talk Reads assigned to the wrong sample.
Sequence error Polymerase errors and bad base calls
Low complexity Sequencer noise
PhiX Unfiltered spike-in
Chimeras Unfiltered PCR chimeras
Mistargeting Primers amplify a different region
Contaminants Self-explanatory
Primers Primer-binding sequences should be stripped at the start of the pipeline
Tight OTUs OTUs >97% identical

1sco
Search the AlphaFold DB online in seconds >