Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.



FASTQ format parameters

See also
FASTQ files
  Quality scores
  Wikipedia article on FASTQ
  Cock et ail (2010) paper describing FASTQ

FASTQ formats
Unfortunately, the FASTQ format is not standardized. There are several variants in common use, and it is not possible to distinguish them automatically with high reliability. The main parameters are the minimum and maximum Q scores and the ASCII_BASE constant.

The fastq_chars command can be used to guess the format of a FASTQ file.

ASCII coding of Q scores
The Q value is coded as a printable ASCII character using Q = ASCII_CODE - ASCII_BASE. Here, ASCII_CODE is the ASCII code for the character and ASCII_BASE is a constant. The original Sanger FASTQ format used ASCII_BASE = 33 so for example if the quality score is coded as 'C' then Q = ASCII_CODE('C') - 33 = 67 - 32 = 35. See here for tables mapping ASCII characters to Q scores for common variants of FASTQ.

Option Default Description
-fastq_ascii 33 ASCII_BASE constant described above.
-fastq_qmin 0 Minimum Q score.
-fastq_qmax 41 Maximum Q score (input files).
-fastq_qmaxout 41 Maximum Q score (output files).