Average Q score
Home Software Services About Contact     
 
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

24-Nov-2016
UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.

 

USEARCH v11

Calculating average Q (Phred) scores is a bad idea

 
See also
 
Quality scores
 
FASTQ files

Read quality filters often use an average Q score to determine if the read has high or low quality. This is a  really bad idea! The average Q score is not a good indicator of the number of errors predicted by the individual Q scores. This is illustrated by the example in the table below, which describes two reads of length 150nt.

Q scores in read Avg. Q Expected number of errors
140 x Q35 + 10 x Q2 33 6.4 !
150 x Q25 25 0.5

Notice that the read with higher average Q has a much larger number of expected errors due to the Q2 bases, which have an error probability of 0.5. With P=0.5, we expect about half of the Q2 bases to be wrong, so the expected number of errors in the read is at least 5. As this example shows, if there are a few low Q scores in a read with generally high Q scores, then the average Q is a very poor indicator of the expected accuracy of the read.