Home Software Services About Contact usearch manual
Taxonomy annotations

Taxonomy annotations are used to indicate the taxonomy of a sequence.

A taxonomy annotation is specified as a tax=nnn field in the sequence label. Here, nnn is an integer giving the node in a taxonomy tree file. The tax=nnn field may appear anywhere in the label. It must be delimited by semi-colons, which may optionally be omitted at the end of a label, though this is not recommended. The following label has a valid taxonomy annotation:

>KR08766;tax=2034;

I usually distribute FASTA files with labels in this format (the white space inside the label is a tab)::

Taxonomic names are specified as Root;Kingdom;Phylum... to the lowest classification level (usually, genus or species). This format is compatible with the utax command and is also accepted as input for training the command-line version of the RDP Naive Bayesian Classifier. When used as input to the utax command, the names are not needed, only the tax=nnn field is required. However, it is often nice to see the names, e.g. when reviewing hits from usearch_global.