Output from cluster_otus is a FASTA file containing OTU representative sequences. Further analysis often requires an OTU table, which requires assigning reads to OTUs.
One method for assigning a read to an OTU is to find the OTU representative sequence with highest identity with the read, noting that there may be ties in which case the assignment is ambiguous. This is a database search task: reads are query sequences and the OTU representative sequences are the database to be searched. A threshold of 97% is typically used. Reads which do not map to an OTU with this identity are discarded.
The usearch_global command supports generating OTU tables using the options described below.
Sequence labels must have sample identifiers (input set) and OTU identifiers (database) as explained later in this page. This means that you cannot use the input file to cluster_otus for this step because several samples often have the same unique sequence, so the dereplicated (unique) sequence labels either do not have a sample identifier, or have a misleading sample identifier because the same sequence may be found in other samples. The way to deal with this is usually to go back to the "raw" reads after merging or truncating to a fixed length. See sample identifiers for ways to add sample identifiers to the read labels.
(v8.1.1800 and later). QIIME classic tabbed text format.
(v8.1.1800 and later). BIOM v1.0 format (JSON). The biom utility can be used to convert to BIOM v2.1 format (HDF5).
(v8.1.1822 and later). Mothur "shared" file.
The OTU sequences must have OTU identifiers in
See OTU identifiers for details.
Reads must have sample identifiers in the labels
See sample identifiers for details.
Singletons and low-quality reads
You can (probably should) include singletons and reads which did not pass the quality filter. If they are 97% similar to an OTU sequence, they are probably good enough to count even if they do have some sequencer or PCR error.
Reads should be trimmed
The reads should be trimmed in the same way (if any) as the input sequences you used for cluster_otus.
If a size annotation is found in an read label, the abundance will be added to the total for its OTU.
Typical command to generate an OTU table
With correctly formatted labels, the OTU table is generated using a command like this.
usearch -usearch_global reads.fa -db
otus.fa -strand plus -id 0.97 -otutabout otu_table.txt \
Generating an OTU table with predicted taxonomies
If the OTU sequences have tax=xxx; annotations, these will be included as an extra column in the tabbed file or as taxonomy metadata in the BIOM file. These annotations can be generated using the -fastaout option of the utax command, e.g.:
usearch -utax otus.fa -db 16s.udb
-strand both -fastaout otus_tax.fa