FASTQ files
Home Software Services About Contact     
 
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

24-Nov-2016
UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.

 

USEARCH v11

FASTQ files

See also
 
Quality scores
  Average Q is a bad idea!
  FASTQ format options
 
Wikipedia article on FASTQ
  Expected errors
  Cock et ail (2010) paper describing FASTQ

FASTQ files are text files containing sequence data with a quality (Phred) score for each base, represented as an ASCII character. The quality score is an integer (Q) which is typically in the range 2 - 40, but higher and lower values are sometimes used. In particular, versions 1.8 and later of the Illumina platform generate reads with Q scores up to 41.

Unfortunately, the FASTQ format is not standardized. There are several variants in common use, and it is not possible to distinguish them automatically with high reliability. The fastq_chars command can be used to guess the format of an unknown file. See FASTQ format options.

Image
FASTQ read with 50 base calls in Illumina format (ASCII_BASE=33).
There are always four lines per read. The first line starts with '@', followed by the label.
The third line starts with '+'. In some variants, the '+' line contains a second copy of the label.
The fourth line contains the Q scores represented as ASCII characters.