Home Software Services About Contact     
 
USEARCH v11

Random forest parameter file

See also
  Random forest classifiers
  OTU importance
  forest_train command
  otutab_forest_train command

A random forest is trained using the  forest_train command or  otutab_forest_train command, which generates a parameter file. Random forest parameter files are tabbed text files.

Comment lines start with a hashtag (#). Comment lines are used to report accuracy metrics. Reported metrics include:

#err = error rate.
#meanpe = mean probability of error.
#mse = mean squared error.
#oob_err = out-of-bag error rate.
#oob_meanpe = out-of-bag mean probability of error.
#oob_mse = out-of-bag mean squared error.

Metrics ending with _w are calculated with weighting, e.g. mse_w is the mean squared error rate with category weighting, i.e. each observation is weighted by 1/n where n is the number of test observations for that category.

Lines starting with var are features (also called variables). For example:

var 6 Otu123 0.00821

Fields are:

#1. var
#2. index of the variable (0, 1, 2...)
#3. name of the variable (typically, OTU name)
#4. importance of the variable
 
To extract the OTU importance values from a forest parameter file and sort them in order of decreasing importance, you could use:

grep -w "^var" forest.txt | cut -f3,5 | sort -rgk2