Annotations in sequence labels

Many usearch command support or require annotations in sequence labels. Annotations indicate attributes of a sequence such as its abundance, sample identifier, taxonomy etc. There are no generally accepted standards for including annotations in FASTQ or FASTA files, so annotations are not usually compatible with other software packages.

Most annotations have the form name=value.

In usearch, annotations are separated by semi-colons. The first annotation begins at the first semi-colon. The label up to the first semi-colon is sometimes understood to be an implied name, e.g. an OTU identifier.

A semi-colon terminating the last annotation at the end of the label is optional, but recommended.

White space (blanks and tabs) are allowed within annotations, but are discouraged because they can cause problems. For example, if a sequence label has a tab and is written to a tab-separated text file, then the number of fields will be messed up.