
See also
  OTU clustering
  UPARSE pipeline
  cluster_otus command
Output from cluster_otus is a FASTA file 
containing OTU representative sequences. Further analysis often requires an
OTU table, which requires assigning reads to OTUs.
I recommend creating OTUs from pooled samples, i.e. by concatenating reads for all samples that were sequenced in the same run. This is important for getting the best detection of chimeras and cross-talk, and for getting the best sensitivity to low-abundance sequences that could be lost if individual samples or subsets of samples are clustered separately.
One method for assigning a read to an OTU is to find the OTU representative sequence with highest identity with the read, noting that there may be ties in which case the assignment is ambiguous. This is a database search task: reads are query sequences and the OTU representative sequences are the database to be searched. A threshold of 97% is typically used. Reads which do not map to an OTU with this identity are discarded.
The usearch_global command supports generating OTU tables using the options described below.
Sequence labels must have sample identifiers (input set) and OTU identifiers (database) as explained later in this page. This means that you cannot use the input file to cluster_otus for this step because several samples often have the same unique sequence, so the dereplicated (unique) sequence labels either do not have a sample identifier, or have a misleading sample identifier because the same sequence may be found in other samples. The way to deal with this is usually to go back to the "raw" reads after merging or truncating to a fixed length. See sample identifiers for ways to add sample identifiers to the read labels.
-otutabout filename
    
	QIIME classic tabbed 
	text format.
-biomout filename
    
	
	BIOM v1.0 format (JSON). The
	biom 
	utility can be used to convert to
	
	BIOM v2.1 format (HDF5).
-mothur_shared_out filename
   
	Mothur "shared" file.
The OTU sequences must have OTU identifiers in 
	the labels
	See OTU identifiers for details.
Reads must have sample identifiers in the labels
	See sample identifiers for details.
Singletons and low-quality reads
You 
	can (probably should) include singletons and reads which did not pass the quality filter. 
	If they are 97% similar to an OTU sequence, they are probably good enough to 
	count even if they do have some sequencer or PCR error.
Reads should be trimmed
The reads 
	should be trimmed in the same way (if any) as the input sequences you used 
	for cluster_otus.
Typical command to generate an OTU table
With correctly formatted labels, the OTU table is generated using a command 
like this.
usearch -usearch_global reads.fa -db 
otus.fa -strand plus -id 0.97 -otutabout otu_table.txt \
  -biomout otu_table.json