UTAX alnout file
The -alnout option of the utax command gives the name of a
text file for human-readable alignments. This option is suppoered in v8.1.1800
See example utax alnout file.
The report for includes a rank table showing the top hit and the nearest
neighbor at each rank; example below.
Rank names are g for genus etc. (see
taxonomy annotations), + indicates the top hit,
which gives the predicted taxonomy. The nearest neighbor is given for each rank,
i.e. the database sequence with highest identity having a different name at that
rank. PctId is the identity from a pair-wise
alignment of the reference sequence (target) to the query, WordId is the
k-mer identity (fraction of unique query words that are also found in the
target), Raw is the UTAX algorithm raw score and Pvalue is the
The Target column gives the accession (i.e., the
reference sequence label after annotations have been stripped), followed by the
taxon names at that rank. The first name is the taxon in the top hit, the second
name (if any) is the taxon of the nearest neighbor target sequence at that rank.
Reviewing the rank table
Confidence in a taxon prediction should be high if the identity with the top
hit is high, but but lower if the identity of the nearest neighbor at that rank
is close to the top hit.
In the above example, we see that the identity of the
top hit is 83%. This is too low to expect the genus to be the same, so the low
P-value (0.0045) seems reasonable.
The nearest neighbor at family rank is very close to
the top hit -- both are 83% to two significant figures. This is much too close
to believe that the top hit is significantly closer, and even if it was this
could be an artifact of so we should be skeptical of the family assignment and
again a low P-value (0.14) is reasonable. This confirms that the genus is
unlikely to be the same as the top hit.
It is more plausible that the class and phylum are the
same at 83% identity, so the higher P-values (0.72 for class and 0.997 for
phylum) seem reasonable.