SINAPS algorithm

SINAPS (Simple Non-Bayesian Attribute Prediction Software) predict atrributes (traits) for a marker gene sequence. From an abstract perspective, SINAPS is essentially the same algorithm as SINTAX. A reference database is provided in which each marker gene sequence is annotated with the trait to be predicted. A random subset of words is extracted from the query sequence and used to find the reference database sequence (top hit) with most words in common. This process is repeated 100 times and the most frequently occurring top-hit trait is reported as the predicted trait for the query. The number of iterations in which that trait was the top hit is reported as the bootstrap confidence value.