Home Software Services About Contact usearch manual
otutab command
Commands > OTU analysis

See also
 
Defining and interpreting OTUs
  OTU clustering
  UPARSE pipeline
  UNOISE pipeline
  OTU commands

The otutab command generates an OTU table by mapping reads to OTUs.

OTU table output
See OTU table output options.

Normalizing the table
After generating the table, you should use the otutab_norm command to normalize all samples to the same number of reads.

Query dataset
The query file can be in FASTQ or FASTA format. Every query sequence must be labeled with a sample identifier. The fastx_get_sample_names command can be used to check that your sample names are formatted correctly.

Query sequences are typically raw reads, i.e. reads after paired read merging, if applicable, but before quality filtering. Low-quality reads and singletons can often be mapped successfully to an OTU, so including them accounts for a larger fraction of the reads. The fastx_uniques_persample command can be used to find the unique sequences and abundances for all samples. This compresses the input data and makes the otutab command somewhat faster but probably not as much as you might expect (typically, the compression is only ~2x).

OTU database
The search database is either a set of OTU sequences or "ZOTU" sequences, i.e. denoised sequences. Each query sequence is mapped to the closest database sequence. Ties are broken systematically by picking the first in database file order. A udb database can be used. Database sequences must be labeled with OTU identifiers. The database file is specified by the -otus or -zotus option. Use -zotus if the OTUs are denoised, -otus otherwise.

Identity threshold for mapping
The -id option sets the minimum fractional identity. Default is 0.97, corresponding to 97% identity. Denoised OTUs also use a 97% identity threshold by default to allow for sequencing and PCR error. See defining and interpreting OTUs for discussion.

By default, reads are assumed to be on the same strand as the OTU sequences. You can use -strand both to search both strands.

The -notmatched option specifies a FASTA filename for sequences which are not assigned to an OTU.

The -notmatchedfq option specifies a FASTQ file for unassigned sequences (input must be FASTQ).

Annotations are stripped from the OTU sequence labels unless the -keep_annots option is specified.

Multithreading and standard output files are supported.

Example

usearch -otutab reads.fq -otus otus.udb -otutabout otutab.txt -biomout otutab.json \
  -mapout map.txt -notmatched unmapped.fa