Home Software Services About Contact     
 
USEARCH v11

otutab_forest_kfold command

See also
  random forest classifiers
  k-fold cross-validation
  otutab_forest_train command
  otutab_forest_classify command

Performs k-fold cross-validation of random forest classifiers on an OTU table with known categories.

A metadata file must be specified with the -meta option.

For k iterations, a classifier is trained and its accuracy measured by splitting the samples into a test subset and training subset.

The number of iterations k is given by the -tries option. Default 6.

By default, the test set is a random subset size 1/k of the samples and the training set is the remaining (k - 1)/k samples. This can be changed by the -testpct option which specifies the size of the test set as a percentage. For example, using -tries 5 -testpct 10 will perform five iterations where the test set is 10% of the samples.

The -trees option specifies the number of trees in the forest. Default 100. Increasing the number of trees may improve accuracy on unusually complex datasets at the expense of slower execution times for training and classification. In my experience, 100 is enough for typical 16S experiments. You can check by comparing the training accuracy with different numbers of trees.

The -tabbedout option specifies a k-fold validation tabbed output file.

Example

usearch -otutab_forest_kfold otutab.txt -meta meta.txt -tabbedout results.txt