Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.



Strategies for dealing with low-quality reverse reads (R2s)

See also
  fastq_mergepairs command
  fastq_mergepairs options
  fastq_filter command

Sometimes, reverse reads have substantially lower quality than forward reads, especially when the read length is longer. This can cause a low rate of merged reads and / or a large number of reads to be discarded by quality filtering. Following are some strategies that can help.

Truncate all R2s to a shorter fixed length
Typically quality drops towards the end of the read. You can check for this using the fastq_eestats2 command. You can then consider truncating all R2s before merging by using the fastx_truncate command.

Truncate R2s using a minimum Q score
Usually, quality filtering that gives variable length reads is not recommended (see global trimming for discussion). However, if the reads will be merged then truncating the forward or reverse reads to discard low-quality bases can be effective because the merged sequence will be globally trimmed regardless of end trimming.

 The fastq_trunctail option, default value 2, truncates both the forward and reverse read before the first base with the given quality score. Q=2 means a probability of 63% that the base call is wrong, so this is a pretty conservative threshold. You might consider increasing it to 3, 4 or 5. Note that this will truncate both the R1 and R2.

There are two alternative ways to do this which allows setting different Q score thresholds for the R1s and R2s: the fastq_filter command with the fastq_truncqual option or the fastq_mergepairs command with the fastq_minqual option. Note that the definitions of these options are different:fastq_truncqual truncates at the first Q score which is less than or equal to the given value, while fastq_minqual truncates at the first Q score which is less than the given value.

Using fastq_filter has the advantage that you can analyze the FASTQ output using commands such as fastq_eestats2, this is useful for choosing the minimum Q score -- try different values and review the results. If you filter forward and reverse reads separately then you will get different sets of reads. Or, you can use fastq_filter to tune the value of fastq_truncqual (Q) then set the fastq_minqual option of fastq_mergepairs to Q+1.

I suggest trying minimum Q score values between 2 and 5.

Discard reverse reads and make OTUs from forward reads only
Sometimes, the reverse reads are so bad that it is better to discard them and make OTUs from the forward reads only. This can be a difficult decision because it is hard to throw expensive data away. One approach is to make several sets of OTUs using different strategies (e.g., forward only, merged) and compare the results. The trade-off is between better phylogenetic resolution with more bases versus reduced sensitivity to low-abundance species with fewer reads. Sometimes, it is reasonable to use different sets of OTUs for different types of analysis.