MUSCLE manual
Problems with BALIBASE
 
For several years, BALIBASE was an influential benchmark set of multiple alignments. Many algorithm developers, myself included, tried to achieve improvements in alignment accuracy on BALIBASE, and emphasized reporting BALIBASE results in the MUSCLE papers. I now believe this was a mistake due to fundamental flaws in BALIBASE. These flaws are discussed in detail in my paper Quality measures for protein alignment benchmarks.
 
Problem   Discussion Example
click image for details
Most proteins in BALIBASE do not have known structure.   Most BALIBASE alignments were obtained by sequence methods. In these cases, comparing MUSCLE to BALIBASE is comparing two different sequence alignment methods. Implicitly, the authors of BALIBASE are claiming that their sequence alignment methods are so much better than methods they are testing that they can be used as a gold standard. I'm not convinced.
 
Many sets are not globally alignable.   Many of the BALIBASE sets contain proteins that are not globally alignable because they contain different domains in different orders. Often, only a single, small domain is common to all proteins in the input set. Most popular methods like CLUSTALW, MUSCLE, T-COFFEE, PROBCONS etc. are global aligners that assume the input sequences differ only by small mutations with no major rearrangements. This assumption is grossly violated by BALIBASE.
 
Non-homologous regions are aligned   Many sets align regions that are definitively not homologous or structurally similar.
 
Many structural alignments are wrong   Even in regions that have some structural similarity, the BALIBASE alignments have obvious mistakes according to their own alignment criteria, e.g. aligning surface loops to buried core secondary structure.
 
Incorrect core blocks   Core blocks should have unambiguously conserved secondary structure to ensure that structure and sequence alignments agree. In many cases, they do not.