Home Software Services About Contact     

cluster quality and sequence identity

USEARCH uses the BLAST definition of sequence identity. Through version 5, USEARCH used the CD-HIT definition by default.

For a given alignment, BLAST identity <= CD-HIT identity. This is because BLAST counts gaps as differences, but CD-HIT sometimes does not. Insertions and deletions are generally less probable than substitutions. Therefore, gaps should count as least as much as substitutions as a measure of evolutionary distance, and the BLAST definition is more biologically realistic.