README.TXT file for BALI2DNAF.

The tarball ref.tz contains the refernce alignments. To extract,
change to the directory where you want the alignments and use:

tar -zxf ref.tz

Reference alignments are in FASTA format, and are annotated as follows.

Upper-case letters and dashes (-) indicate core blocks in the
original amino acid alignments in BALIBASE v2.

Lower-case letters and dots (.) indicate regions out-side core blocks
which are not reliably aligned in BALIBASE.

Gaps due to the insertion of one or two spurious letters in one
sequence in each set are indicated as exclamation marks (!).

To create BALI2DNA, strip all columns containing an exclamation mark
on one or more sequences.

To reconstruct input sequences, strip all gap characters (.-!) from
the reference alignment.

The typical command-line used for frame-shift detection in MUSCLE is:

muscle4 --input seqs.fasta --quiet --frameshift_only --log muscle.log --cons 0

The log files produced by MUSCLE on BALI2DNA and BALI2DNAF are found in
the bali2dna[f]_log.tz, which can also be extracted using tar -zxf <filename>.

Contact: bob@drive5.com.
