The fastx_subsample command generates a random subset of sequences
in a FASTA or FASTQ file. The subset is written to filename(s) given by the -fastaout
and/or -fastqout options.
This command is useful for making fast assessments of large datasets, e.g. by
analyzing a small sample of NGS reads.
Paired reads are supported by using the -reverse option to specify the
reverse (R2) filename. You must then provide the -output2 option for the
The size of the subset must be specified by the -sample_size or -sample_pct
options, which give the number of sequences and percentage of the input
Size annotations are supported if the -sizein
and -sizeout options are given.
If the -xsize option is given, any size annotations in the input sequence
labels are stripped.
The -randseed option sets a seed for the random number generator, enabling
reproducible subsets to be generated. By default, the seed is taken from the
system clock so that in general the subset will change each time the command is
run. The value must be an integer.
The -notmatched and -notmatchedfq options are filenames for sequences
that are not selected in FASTA and FASTQ format, respectively.
usearch -fastx_subsample raw_reads.fastq -sample_pct 10 -randseed
1 -fastaout ten_pct.f
usearch -fastx_subsample derep.fa -sizein -sizeout -sample_size
10000 -fastaout ten_k.fa
usearch -fastx_subsample Sample_R1.fq -reverse
Sample_R2.fq -fastqout fwd.fq \
-output2 rev.fq -sample_pct 25