otutab command

The otutab command generates an OTU table by mapping reads to OTUs .

OTU table output
See OTU table output options .

Normalizing the table
After generating the table,I recommend using the otutab_rare command to normalize all samples to the same number of reads.

Query dataset
The query file can be in FASTQ or FASTA format. Every query sequence must be labeled with a sample identifier . The fastx_get_sample_names command can be used to check that your sample names are formatted correctly.

Query sequences are typically raw reads, i.e. reads after paired read merging, if applicable, but before quality filtering. Low-quality reads and singletons can often be mapped successfully to an OTU, so including them accounts for a larger fraction of the reads (see OTU coverage ). The fastx_uniques_persample command can be used to find the unique sequences and abundances for all samples. This compresses the input data and makes the otutab command somewhat faster but probably not as much as you might expect (typically, the compression is only ~2x).

OTU database
The search database is either a set of OTU sequences or "ZOTU" sequences, i.e. denoised sequences . Each query sequence is mapped to the closest database sequence. Ties are broken systematically by picking the first in database file order. A udb database can be used. Database sequences must be labeled with OTU identifiers . The database file is specified by the -otus or -zotus option. Use -zotus if the OTUs are denoised, -otus otherwise.

Identity threshold for mapping
The -id option sets the minimum fractional identity . Default is 0.97, corresponding to 97% identity. Denoised OTUs also use a 97% identity threshold by default to allow for sequencing and PCR error. See defining and interpreting OTUs and mapping reads to OTUs for discussion.

By default, reads are assumed to be on the same strand as the OTU sequences. You can use -strand both to search both strands.

Other output files
Standard output files are supported for reporting hits.

The -mapout option specifies a tabbed text file name with two fields, 1. read label and 2. OTU label. You would get the same output by using -userout with -userfields query+target with -top_hit_only.

The -notmatched option specifies a FASTA filename for sequences which are not assigned to an OTU.

The -notmatchedfq option specifies a FASTQ file for unassigned sequences (input must be FASTQ).

Multithreading is supported.

Example

usearch -otutab reads.fq -otus otus.udb -otutabout otutab.txt -biomout otutab.json \
-mapout map.txt -notmatched unmapped.fa -dbmatched otus_with_sizes.fa -sizeout