Home Software Services About Contact     
 
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

24-Nov-2016
UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.

 

USEARCH v11

otutab command

See also
 
Defining and interpreting OTUs
  OTU /denoising analysis
  OTU commands

The otutab command generates an OTU table by mapping reads to OTUs.

OTU table output
See OTU table output options.

Normalizing the table
After generating the table,I recommend using the otutab_rare command to normalize all samples to the same number of reads.

Query dataset
The query file can be in FASTQ or FASTA format. Every query sequence must be labeled with a sample identifier. The fastx_get_sample_names command can be used to check that your sample names are formatted correctly.

Query sequences are typically raw reads, i.e. reads after paired read merging, if applicable, but before quality filtering. Low-quality reads and singletons can often be mapped successfully to an OTU, so including them accounts for a larger fraction of the reads (see OTU coverage). The fastx_uniques_persample command can be used to find the unique sequences and abundances for all samples. This compresses the input data and makes the otutab command somewhat faster but probably not as much as you might expect (typically, the compression is only ~2x).

OTU database
The search database is either a set of OTU sequences or "ZOTU" sequences, i.e. denoised sequences. Each query sequence is mapped to the closest database sequence. Ties are broken systematically by picking the first in database file order. A udb database can be used. Database sequences must be labeled with OTU identifiers. The database file is specified by the -otus or -zotus option. Use -zotus if the OTUs are denoised, -otus otherwise.

Identity threshold for mapping
The -id option sets the minimum fractional identity. Default is 0.97, corresponding to 97% identity. Denoised OTUs also use a 97% identity threshold by default to allow for sequencing and PCR error. See defining and interpreting OTUs and mapping reads to OTUs for discussion.

By default, reads are assumed to be on the same strand as the OTU sequences. You can use -strand both to search both strands.

Other output files
Standard output files are supported for reporting hits.

The -mapout option specifies a tabbed text file name with two fields, 1. read label and 2. OTU label. You would get the same output by using -userout with -userfields query+target with -top_hit_only.

The -notmatched option specifies a FASTA filename for sequences which are not assigned to an OTU.

The -notmatchedfq option specifies a FASTQ file for unassigned sequences (input must be FASTQ).

Multithreading is supported.

Example

usearch -otutab reads.fq -otus otus.udb -otutabout otutab.txt -biomout otutab.json \
  -mapout map.txt -notmatched unmapped.fa -dbmatched otus_with_sizes.fa -sizeout