cl-random-forest is a implementation of Random Forest for multiclass classification written in Common Lisp. It also includes a implementation of Global Refinement of Random Forest (Ren, Cao, Wei and Sun. "Global Refinement of Random Forest" CVPR2015). This refinement makes faster and more accurate than standard Random Forest.
Features and Limitations
- Faster and more accurate than other major implementations such as scikit-learn (Python/Cython) or ranger (R/C++)
|MNIST||96.95%, 41.72sec||97.17%, 69.34sec||98.29%, 12.68sec|
|letter||96.38%, 2.569sec||96.42%, 1.828sec||97.32%, 3.497sec|
|covtype||94.89%, 263.7sec||83.95%, 139.0sec||96.01%, 103.9sec|
|usps||93,47%, 3.583sec||93.57%, 11.70sec||94.96%, 0.686sec|
Supporting parallelization of training and prediction (SBCL Only)
It also includes Global Pruning algorithm of Random Forest which can make the model extremely compact
Currently, regression is not implemented and only classification is available
In quicklisp?s local-projects directory,
git clone https://github.com/masatoi/cl-online-learning.git git clone https://github.com/masatoi/cl-random-forest.git
When using Roswell,
ros install masatoi/cl-online-learning masatoi/cl-random-forest
Prepare training dataset
A dataset consists of a target vector and a input data matrix. For classification, the target vector should be a fixnum simple-vector and the data matrix should be a 2-dimensional double-float array whose row corresponds one datum. Note that the target is a integer starting from 0. For example, the following dataset is valid for 4-class classification with 2-dimensional input.
(defparameter *target* (make-array 11 :element-type 'fixnum :initial-contents '(0 0 1 1 2 2 2 3 3 3 3))) (defparameter *datamatrix* (make-array '(11 2) :element-type 'double-float :initial-contents '((-1.0d0 -2.0d0) (-2.0d0 -1.5d0) (1.0d0 -2.0d0) (3.0d0 -1.5d0) (-2.0d0 2.0d0) (-3.0d0 1.0d0) (-2.0d0 1.0d0) (3.0d0 2.0d0) (2.0d0 2.0d0) (1.0d0 2.0d0) (1.0d0 1.0d0))))
Make Decision Tree
To construct a decision tree, MAKE-DTREE function is available. This function receives the number of classes, the data matrix and the target vector and then returns a decision tree object. This function also receives optionally the max depth of the tree and the minimum number of samples in the region the tree divides and the number of trials of splits.
(defparameter *n-class* 4) (defparameter *dtree* (make-dtree *n-class* *datamatrix* *target* :max-depth 5 :min-region-samples 1 :n-trial 10))
Next, make a prediction from the constructed decision tree with PREDICT-DTREE function. For example, to predict the first datum in the data matrix with this decision tree, do as follows.
(predict-dtree *dtree* *datamatrix* 0) ;; => 0 (correct class id)
To make predictions for the entire dataset and calculate the accuracy, use TEST-DTREE function.
(test-dtree *dtree* *datamatrix* *target*) ;; Accuracy: 100.0%, Correct: 11, Total: 11
Make Random Forest
To construct a random forest, MAKE-FOREST function is available. In addition to the MAKE-DTREE function arguments, this function receives optionally the number of decision trees and the bagging ratio that is used for sampling from training data to construct each decision trees.
(defparameter *forest* (make-forest *n-class* *datamatrix* *target* :n-tree 10 :bagging-ratio 1.0 :max-depth 5 :min-region-samples 1 :n-trial 10))
Prediction and test of random forest are done in the almost same way as decision trees. PREDICT-FOREST function and TEST-FOREST function are available for each purpose.
(predict-forest *forest* *datamatrix* 0) ;; => 0 (correct class id) (test-forest *forest* *datamatrix* *target*) ;; Accuracy: 100.0%, Correct: 11, Total: 11
Satoshi Imai (email@example.com)
This software is released under the MIT License, see LICENSE.txt.
- Satoshi Imai, NIL
- MIT Licence