Annotations in sequence labels
Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.



Annotations in sequence labels

See also
  Size annotations
  Sample identifiers
  OTU identifiers
  fastx_strip_annots command
  fastx2qiime command

Many usearch command support or require annotations in sequence labels. Annotations indicate attributes of a sequence such as its abundance, sample identifier, taxonomy etc. There are no generally accepted standards for including annotations in FASTQ or FASTA files, so annotations are not usually compatible with other software packages.

Most annotations have the form name=value.

In usearch, annotations are separated by semi-colons. The first annotation begins at the first semi-colon. The label up to the first semi-colon is sometimes understood to be an implied name, e.g. an OTU identifier.

A semi-colon terminating the last annotation at the end of the label is optional, but recommended.

White space (blanks and tabs) are allowed within annotations, but are discouraged because they can cause problems. For example, if a sequence label has a tab and is written to a tab-separated text file, then the number of fields will be messed up.