The fastx_subsample command generates a random subset of sequences
in a FASTA or FASTQ file. The subset is written to filename(s) given by the -fastaout
and/or -fastqout options..
This command is useful for making fast assessments of large datasets, e.g. by
analyzing a small sample of NGS reads, and for rarefaction
The size of the subset must be specified by the -sample_size or -sample_pct
options, which give the number of sequences and percentage of the input
Size annotations are supported if the -sizein
and -sizeout options are given.
If the -xsize option is given, any size annotations in the input sequence
labels are stripped.
The -randseed option sets a seed for the random number generator, enabling
reproducible subsets to be generated. By default, the seed is taken from the
system clock so that in general the subset will change each time the command is
run. The value must be an integer.
usearch -fastx_subsample raw_reads.fastq -sample_pct 10 -randseed
1 -fastaout ten_pct.f
usearch -fastx_subsample derep.fa -sizein -sizeout -sample_size
10000 -fastaout ten_k.fa