Home Software Services About Contact usearch manual
 

OTU identifiers in sequence labels

Making an OTU table
An OTU table is made by running the usearch_global command with an appropriate output file option, e.g. otutabout. See Mapping reads to OTUs for details.

OTU sequences must have OTU identifiers
When you run usearch_global to make the OTU table, the FASTA file with the OTU sequences must have OTU identifiers in the sequence labels.

OTU identifier syntax
The OTU identifier must start with the three letters OTU (case-insensitive) and continues to the first letter which is not alphanumeric or an underscore. The identifier may appear anywhere in the label, it does not have to be the first field. As a special case, if the identifier starts with otu=, the first four characters are deleted. This means that you can use otu=xxx; annotations where xxx is the OTU identifier, which can now be any string of characters (except semi-colon). The following labels have OTU identifier Otu123.

>Otu123
>Otu123;size=14;
>FA87888ZZQ;Otu123;size=14;
>FA87888ZZQ;otu=Otu123;size=14;

How to get OTU identifiers in your labels
The simplest method is to use the option -relabel Otu when you run cluster_otus. Or, you can write your own script to relabel an existing FASTA file.

WARNING -- QIIME doesn't like underscores in OTU identifiers
Some of my older examples use OTU idenfiers like OTU_123. Underscores in OTU identifiers can cause problems with QIIME, apparently because the Newick tree file standard uses underscore to mean a blank space (because the problem only seems to occur when a tree file is used). Some USEARCH commands only allows letters, digits and underscores in OTU identifiers, so you can't use another punctuation symbol (e.g., a period). The safest choice is to use Otu123.