cl-random-forest

2017-08-30

cl-random-forest

https://travis-ci.org/masatoi/cl-random-forest.svg?branch=master

cl-random-forest is a implementation of Random Forest for multiclass classification written in Common Lisp. It also includes a implementation of Global Refinement of Random Forest (Ren, Cao, Wei and Sun. "Global Refinement of Random Forest" CVPR2015). This refinement makes faster and more accurate than standard Random Forest.

Features and Limitations

  • Faster and more accurate than other major implementations such as scikit-learn (Python/Cython) or ranger (R/C++)
scikit-learn ranger cl-random-forest
MNIST 96.95%, 41.72sec 97.17%, 69.34sec 98.29%, 12.68sec
letter 96.38%, 2.569sec 96.42%, 1.828sec 97.32%, 3.497sec
covtype 94.89%, 263.7sec 83.95%, 139.0sec 96.01%, 103.9sec
usps 93,47%, 3.583sec 93.57%, 11.70sec 94.96%, 0.686sec
  • Supporting parallelization of training and prediction (SBCL Only)

  • It also includes Global Pruning algorithm of Random Forest which can make the model extremely compact

  • Currently, regression is not implemented and only classification is available

Installation

In quicklisp?s local-projects directory,

git clone https://github.com/masatoi/cl-online-learning.git
git clone https://github.com/masatoi/cl-random-forest.git

In Lisp,

(ql:quickload :cl-random-forest)

When using Roswell,

ros install masatoi/cl-online-learning masatoi/cl-random-forest

Usage

Classification

Prepare training dataset

A dataset consists of a target vector and a input data matrix. For classification, the target vector should be a fixnum simple-vector and the data matrix should be a 2-dimensional double-float array whose row corresponds one datum. Note that the target is a integer starting from 0. For example, the following dataset is valid for 4-class classification with 2-dimensional input.

(defparameter *target*
  (make-array 11 :element-type 'fixnum
                 :initial-contents '(0 0 1 1 2 2 2 3 3 3 3)))

(defparameter *datamatrix*
  (make-array '(11 2)
              :element-type 'double-float
              :initial-contents '((-1.0d0 -2.0d0)
                                  (-2.0d0 -1.5d0)
                                  (1.0d0 -2.0d0)
                                  (3.0d0 -1.5d0)
                                  (-2.0d0 2.0d0)
                                  (-3.0d0 1.0d0)
                                  (-2.0d0 1.0d0)
                                  (3.0d0 2.0d0)
                                  (2.0d0 2.0d0)
                                  (1.0d0 2.0d0)
                                  (1.0d0 1.0d0))))

Make Decision Tree

To construct a decision tree, MAKE-DTREE function is available. This function receives the number of classes, the data matrix and the target vector and then returns a decision tree object. This function also receives optionally the max depth of the tree and the minimum number of samples in the region the tree divides and the number of trials of splits.

(defparameter *n-class* 4)

(defparameter *dtree*
  (make-dtree *n-class* *datamatrix* *target*
              :max-depth 5 :min-region-samples 1 :n-trial 10))

Next, make a prediction from the constructed decision tree with PREDICT-DTREE function. For example, to predict the first datum in the data matrix with this decision tree, do as follows.

(predict-dtree *dtree* *datamatrix* 0)
;; => 0 (correct class id)

To make predictions for the entire dataset and calculate the accuracy, use TEST-DTREE function.

(test-dtree *dtree* *datamatrix* *target*)
;; Accuracy: 100.0%, Correct: 11, Total: 11

Make Random Forest

To construct a random forest, MAKE-FOREST function is available. In addition to the MAKE-DTREE function arguments, this function receives optionally the number of decision trees and the bagging ratio that is used for sampling from training data to construct each decision trees.

(defparameter *forest*
  (make-forest *n-class* *datamatrix* *target*
               :n-tree 10 :bagging-ratio 1.0
               :max-depth 5 :min-region-samples 1 :n-trial 10))

Prediction and test of random forest are done in the almost same way as decision trees. PREDICT-FOREST function and TEST-FOREST function are available for each purpose.

(predict-forest *forest* *datamatrix* 0)
;; => 0 (correct class id)

(test-forest *forest* *datamatrix* *target*)
;; Accuracy: 100.0%, Correct: 11, Total: 11

Author

Satoshi Imai (satoshi.imai@gmail.com)

Licence

This software is released under the MIT License, see LICENSE.txt.

Author
Satoshi Imai, NIL
License
MIT Licence