I am sometimes asked about CD-HIT and how CD-HIT clustering compares
with the UCLUST algorithm in USEARCH. Below are links to pages
discussing issues around CD-HIT and assessment of clustering methods.
CD-HIT and USEARCH
methods report different pair-wise identities
where CD-HIT id is 97% and USEARCH id is 86%
where CD-HIT id is 97% and USEARCH id is 95%
alignment errors due to banding
reports systematically higher %ids compared to USEARCH
has low gap penalties and mismatch scores
and USEARCH v5 results on 16S rRNA reads from Costello et al.