Home About Contact     
 

cluster quality and sequence identity


USEARCH uses the BLAST definition of sequence identity . Through version 5, USEARCH used the CD-HIT definition by default.

For a given alignment, BLAST identity <= CD-HIT identity . This is because BLAST counts gaps as differences, but CD-HIT sometimes does not. Insertions and deletions are generally less probable than substitutions. Therefore, gaps should count as least as much as substitutions as a measure of evolutionary distance, and the BLAST definition is more biologically realistic .

1sco
Search the AlphaFold DB online in seconds >