Home Software Services About Contact     
 
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

24-Nov-2016
UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.

 

USEARCH v11

fastx_mask command

 
The fastx_mask command performs low-complexity masking of sequences in a FASTA or FASTQ file.

The -qmask option specifies the masking algorithm to use. Default is fastnucleo or fastamino.

Output is written to -fastaout (FASTA) and/or -fastqout (FASTQ). If the input file is FASTA, then you cannot use -fastqout because the base quality scores are not known.

By default, soft masking is used (lower-case). The -hardmask option can be used to specify hard masking (overwrite Ns for nucleotids, Xs for amino acids).

The min_unmasked_pct option specifies the minimum fraction of the sequence that is not masked. If a larger fraction is masked, the sequence is discarded.

The max_unmasked_pct option specifies the maximum fraction of the sequence that is not masked. If a smaller  fraction is maskd, the sequence is discarded. The typical use of this option is to determine which sequences were discarded by min_unmasked_pct.

Multithreading is supported.

Example

usearch -fastx_mask reads.fastq -fastqout masked.fastq -qmask dust -min_unmasked_pct 50