WARNING
Please note that version 1.1 is not backwards
compatible with v1.0. I apologize for any inconvenience -- I
understand the importance of backwards compatibility, and I usually
strive to maintain it. However, I believe the new design is
significantly better, and I chose to bite the bullet now while there are
relatively few users. The major changes are as follows.
The .uc file format has changed
The tab-separated .uc file format now has one more field that
gives the label of the target sequence (i.e., the database or seed
sequence that matched the query). In v1.0, the last field was the query
label. In v1.1, the last two fields are now the query label and target
label. Since tabs in the labels would cause problems parsing the file,
tabs in labels are now replaced by spaces when FASTA files are read.
Command-line options have changed
Some options have been removed (e.g., --termgaps, --gapdiffs
and --termgapdiffs), and some have been given better names, e.g. --exactalign
is now --nofastalign and --mem is now --split.
Different definition of identity
In v1.0, the default definition of identity was (number of adventitial
letter-letter columns)/(number of letter-letter columns). This sometimes
gave undesirable results due to short regions of high identity in
otherwise gappy alignments. In v1.1, identity is defined as (number of
identical letter-letter columns)/(length of shorter sequence). I believe this will
meet the needs of all current users without the need for --gapdiffs or
--termgapdiffs options. If this causes problems for you, please let me
know and I'll add appropriate options for your needs.
Improved control over alignment
Version 1.1 provides a rich model of gap penalties which
allows fine control over the style of alignment that bests models your
data. You can specify different open and extend penalties for internal
and terminal gaps, for left- and right-end gaps, and different penalties
for query vs. target sequence. This is explained in detail in the
manual.
Improved alignment quality
In some applications, the heuristics used to achieve very
fast alignment speeds (HSPs and banding) could sometimes give
poor-quality alignments by finding spurious HSPs and / or by missing
regions of high similarity due to long indels. Version 1.1 has improved
alignment quality by increased stringency in HSP identification and
increased band width. This may sometimes result in slower execution
times; you can adjust parameters to achieve higher speeds if needed. A
new feature is provided that automatically compares all alignments made
by fast heuristics to an "exact" Needleman-Wunsch alignment.
This allows you to evaluate the speed / sensitivity trade-off on typical
data for your applications.
|