search_16s command (64-bit only)
Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.



search_16s command (64-bit only)

ImageSee also
SEARCH_16S algorithm
  SEARCH_16S paper

The search_16s command searches a long sequence such as a chromosome or contig for 16S genes.
It has exceptionally high accuracy, finding at least 99.9% of known 16S genes with few or no false positives.

A bit vector database is required, specified by the -bitvec option. See creating a bit vector file for the search_16s command.

Input can be in FASTQ or FASTA format.

-hitsout option
FASTA file containing "hits", i.e. regions with elevated density of signature words. These are candidate 16S genes with flanking sequence (see paper for details).

-fastaout option
FASTA file containing predicted 16S genes.

-fragout option
FASTA file containg probable fragments of 16S genes which lack one or both identifying motifs.

-tabbedout option
Tabbed text file containg records for query sequences, hits, full-length genes and fragments.

-start_motif option
Start motif. Default GNTTGATCNTGNC.

-end_motif option

-min_gene_length option
Minimum gene length. Default 1200.

-max_gene_length option
Maximum gene length. Default 2000.

-maxstartdiffs option
Maximum number of mismatches with the start motif. Default 4.

-maxenddiffs option
Maximum number of mismatches with the end motif. Default 4.


usearch -search_16s contigs.fa -bitvec gg97.bitvec -fastaout 16s.fa -tabbedout results.txt