Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.



fastx_subsample command

The fastx_subsample command generates a random subset of sequences in a FASTA or FASTQ file. The subset is written to filename(s) given by the -fastaout and/or -fastqout options.

This command is useful for making fast assessments of large datasets, e.g. by analyzing a small sample of NGS reads.

Paired reads are supported by using the -reverse option to specify the reverse (R2) filename. You must then provide the -output2 option for the reverse subset.

The size of the subset must be specified by the -sample_size or -sample_pct options, which give the number of sequences and percentage of the input sequences, respectively.

Size annotations are supported if the -sizein and -sizeout options are given.

If the -xsize option is given, any size annotations in the input sequence labels are stripped.

The -randseed option sets a seed for the random number generator, enabling reproducible subsets to be generated. By default, the seed is taken from the system clock so that in general the subset will change each time the command is run. The value must be an integer.

The -notmatched and -notmatchedfq options are filenames for sequences that are not selected in FASTA and FASTQ format, respectively.


usearch -fastx_subsample raw_reads.fastq -sample_pct 10 -randseed 1 -fastaout ten_pct.fastq

usearch -fastx_subsample derep.fa -sizein -sizeout -sample_size 10000 -fastaout ten_k.fa

usearch -fastx_subsample Sample_R1.fq -reverse Sample_R2.fq -fastqout fwd.fq \
  -output2 rev.fq -sample_pct 25