cluster quality and sequence identity
USEARCH uses the BLAST definition of sequence identity . Through version 5, USEARCH used the CD-HIT definition by default.
For a given alignment, BLAST identity <= CD-HIT identity . This is because BLAST counts gaps as differences, but CD-HIT sometimes does not. Insertions and deletions are generally less probable than substitutions. Therefore, gaps should count as least as much as substitutions as a measure of evolutionary distance, and the BLAST definition is more biologically realistic .