Random forest classifiers
Feature table file
The otutab_forest_train command is used to train the parameters of a random forest classifier on samples in a OTU table with known categories specified in a metadata file.
A metadata file must be specified using the -meta option.
The -randseed option specifies a random number seed. The value must be a non-negative integer. By default, the seed is randomized using the time of day and operating system process id so that it will almost always be different each time the command is executed. This option can be used to get reproducible results, e.g. -randseed 1.
The -trees option specifies the number of trees in the forest. Default 100. Increasing the number of trees may improve accuracy on unusually complex datasets at the expense of slower execution times for training and classification. In my experience, 100 is enough for typical 16S experiments. You can check by comparing the training accuracy with different numbers of trees.
A forest is trained and the forest parameters are saved to the file specified by the -forestout option. This forest can be used to predict categories for novel data using the forest_classify command or otutab_forest_classify command.
usearch -otutab_forest_train otutab.txt -meta meta.txt -forestout