fastq_filter command
Home Software Services About Contact     
 
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

24-Nov-2016
UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.

 

USEARCH v11

fastq_filter command

 
Performs quality filtering and / or conversion of a FASTQ file to FASTA format.

See also
  Paper describing expected error filtering and paired read merging (Edgar & Flyvbjerg, 2015).
  Paired read assembler and quality filtering benchmark results
 
Read quality filtering
  Expected errors
 
FASTQ format options
  Quality scores

  Choosing FASTQ filter parameters
  Strategies for dealing with low-quality reverse reads (R2s)

The fastx_learn command is useful for checking the error rate after expected error quality filtering, which assumes that the Q scores are accurate. It does not use Q scores so gives an independent check.

Output options

Option   Description
-fastqout filename   FASTQ output file. You can use both -fastqout and -fastaout.
 
-fastaout filename   FASTA output file. You can use both -fastqout and -fastaout.
 
-fastqout_discarded filename   FASTQ output file for discarded reads. You can use both -fastqout_discarded and -fastaout_discarded.
 
-fastaout_discarded filename   FASTA output file for discarded reads. You can use both -fastqout_discarded and -fastaout_discarded.
 
-relabel prefix   Generate new labels for the output sequences. They will be labeled prefix1, prefix2 and so on. For example, if you use -relabel SampleA. then the labels will be SampleA.1, SampleA.2 etc.

The special value @ indicates that the string should be constructed from the file name by truncating the file name at the first underscore or period and appending a period. With a typical Illumina FASTQ file name, this gives the sample name. So, for example, if the FASTQ file name is Mock_S188_L001_R1_001.fastq, then the string is Mock and the output labels will be Mock.1, Mock.2 etc.
 

-fastq_eeout   Append the expected number of errors according to the Q scores to the label in the format "ee=xx;". Expected errors are calculated after truncation, if applicable.
 
-sample string   Append sample=string; to the read label.
 

Filtering options

Option   Description
-fastq_truncqual N   Truncate the read at the first position having quality score <= N, so that all remaining Q scores are >N.
 
-fastq_maxee E   Discard reads with > E total expected errors for all bases in the read after any truncation options have been applied.
 
-fastq_trunclen L   Truncate sequences at the L'th base. If the sequence is shorter than L, discard.
 
-fastq_minlen L   Discard sequences with < L letters.
 
-fastq_stripleft N   Delete the first N bases in the read.
 
-fastq_maxee_rate E   Discard reads with > E expected errors per base.Calculated after any truncation options have been applied. For example, with the fastq_maxee_rate option set to 0.01, then a read of length 100 will be discarded if the expected errors is >1, and a read of length 1,000 will be discarded if the expected errors is >10.
 
-fastq_maxns k   Discard if there are >k Ns in the read.
 

Examples

"Raw" conversion of Sanger FASTQ to FASTA with no filtering:

  usearch -fastq_filter reads.fastq -fastaout reads.fasta -fastq_ascii 64

Truncate to length 150, discard if expected errors > 0.5, and convert to FASTA:

  usearch -fastq_filter reads.fastq -fastq_trunclen 150 -fastq_maxee 0.5 \
    -fastaout reads.fasta

Truncate a read at length 100 and then discard if it contains a Q<15, output to new FASTQ file:

  usearch -fastq_filter reads.fastq -fastq_minlen 100 -fastq_truncqual 15 \