**See also**
RDP Naive Bayesian Classifier algorithm

As a starting point, the simplest estimate is the
frequency observed in the training set = m(w_i)/M, and it would make sense
to add pseudo-counts to model unobserved sequences, but I don't understand
why they chose specifically to add P_i in the numerator
or add 1 in the denominator.

**Reference**Wang,Q.