UTAX alnout file

The -alnout option of the utax command gives the name of a text file for human-readable alignments. This option is suppoered in v8.1.1800 and later.

Rank table
The report for includes a rank table showing the top hit and the nearest neighbor at each rank; example below.

Rank names are g for genus etc. (see taxonomy annotations), + indicates the top hit, which gives the predicted taxonomy. The nearest neighbor is given for each rank, i.e. the database sequence with highest identity having a different name at that rank. PctId is the identity from a pair-wise alignment of the reference sequence (target) to the query, WordId is the k-mer identity (fraction of unique query words that are also found in the target), Raw is the UTAX algorithm raw score and Pvalue is the confidence.

The Target column gives the accession (i.e., the reference sequence label after annotations have been stripped), followed by the taxon names at that rank. The first name is the taxon in the top hit, the second name (if any) is the taxon of the nearest neighbor target sequence at that rank.

Reviewing the rank table
Confidence in a taxon prediction should be high if the identity with the top hit is high, but but lower if the identity of the nearest neighbor at that rank is close to the top hit.

In the above example, we see that the identity of the top hit is 83%. This is too low to expect the genus to be the same, so the low P-value (0.0045) seems reasonable.

The nearest neighbor at family rank is very close to the top hit -- both are 83% to two significant figures. This is much too close to believe that the top hit is significantly closer, and even if it was this could be an artifact of so we should be skeptical of the family assignment and again a low P-value (0.14) is reasonable. This confirms that the genus is unlikely to be the same as the top hit.

It is more plausible that the class and phylum are the same at 83% identity, so the higher P-values (0.72 for class and 0.997 for phylum) seem reasonable.