library(MASS)
library(randomForest)
library(caret)
R
PackagesIn this document, we will compare Random Forests and a similar method called Extremely Randomized Trees which can be found in the R
package extraTrees
. The extraTrees
package uses Java in the background and sometimes has memory issues. The command below modifies the Java back-end to be given more memory by default. (By default the Java Virtual Machine is allocated 512 MB, which we change to 4 GB.) This must be done before loading the extraTrees
package.
options(java.parameters = "-Xmx4g")
library(extraTrees)
Details on the R
package can be found in its vignette.
We will also discuss ranger
an alternative package for fitting a random forest, as well as xgboost
, an alternative boosting package.
Extremely Randomized Trees (ERT) are very similar to Random Forests. (RF) There are essentially two main differences:
mtry
)numRandomCuts
)The resulting “forest” contains trees that are more variable, but less correlated than the trees in a Random Forest. Details of the method can be found in the original paper.
As most papers do, the claim is that Extremely Randomized Trees are better than Random Forests. In practice, you will find this is certainly true sometimes, but not always. Remember, there is no free lunch.
ERT can be used for both classification and regression, much like a RF. We will evaluate a regression example in this document.
We consider the regression case, using the Boston
data from the MASS
package. We will use RMSE as our metric, so we write a function which will help us along the way:
rmse = function(actual, predicted) {
sqrt(mean((actual - predicted) ^ 2))
}
As always, we test-train split the data. Half for training, half for testing in this case.
set.seed(42)
boston_idx = sample(1:nrow(Boston), nrow(Boston) / 2)
boston_trn = Boston[boston_idx,]
boston_tst = Boston[-boston_idx,]
Notice that this dataset contains 13 predictor variables.
We first train a Random Forest model. For this example, we will use cross-validation to select a value of mtry
, the tuning parameter for RF. Two reasons for this:
extraTrees
package. So we’ll use CV for both to keep the comparison as similar as possible.We setup both our cross-validation (5 fold) and a grid of mtry
values. (Here, trying all possible values.)
cv_5 = trainControl(method = "cv", number = 5)
rf_grid = expand.grid(mtry = 1:13)
We then train the model.
set.seed(42)
rf_fit = train(medv ~ ., data = boston_trn,
method = "rf",
trControl = cv_5,
tuneGrid = rf_grid)
We suppress the bulk of the output and only view the selected tuning parameters and a plot of our results.
#rf_fit
rf_fit$bestTune
## mtry
## 7 7
plot(rf_fit)
rmse(predict(rf_fit, boston_tst), boston_tst$medv)
## [1] 4.122169
We find the resulting test RMSE for our chosen RF model with mtry
= 7 to be 4.1221691.
We now try an Extremely Randomized Trees model. The ERT model has two parameters:
mtry
which works in the same way as RF.numRandomCuts
which determines the number of randomly chosen splits for each of the mtry
predictors selected for each split. Lower values make trees more random.When specifying the grid of values, we only use selected values of mtry
and only “small” values of numRandomCuts
to keep computation time somewhat reasonable. (Remember we’re cross-validating which takes more time than using OOB samples.) Usually numRandomCuts
is probably kept smaller than these values, say 1:5
but these values were chosen for the plot below. A value of 1
is the value for the originally specified ERT method.
et_grid = expand.grid(mtry = 4:7, numRandomCuts = 1:10)
We train the model using caret
with method = "extraTrees"
which uses the extraTrees
package. When training the model, we add one extra argument, numThreads = 4
which tells R
to use 4 cores in the Java Virtual Machine. (Which will speed up computation.)
set.seed(42)
et_fit = train(medv ~ ., data = boston_trn,
method = "extraTrees",
trControl = cv_5,
tuneGrid = et_grid,
numThreads = 4)
Again, we suppress the bulk of the output and only view the selected tuning parameters and a plot of our results.
#et_fit
et_fit$bestTune
## mtry numRandomCuts
## 33 7 3
plot(et_fit)
rmse(predict(et_fit, boston_tst), boston_tst$medv)
## [1] 3.882574
We find the resulting test RMSE for our chosen ERT model to be 3.882574. So, for this example, Extremely Randomized Trees win over Random Forests, but remember this won’t always be the case.
ranger
The ranger
package simply re-implements the random forest method. It has a number of speed advantages, including the ability to grow trees in parallel.
library(ranger)
set.seed(42)
system.time({ranger_fit = train(medv ~ ., data = boston_trn,
method = "ranger",
trControl = cv_5,
num.threads = 1,
tuneGrid = rf_grid)})
## user system elapsed
## 9.84 0.50 10.17
set.seed(42)
system.time({ranger_fit = train(medv ~ ., data = boston_trn,
method = "ranger",
trControl = cv_5,
num.threads = 4,
tuneGrid = rf_grid)})
## user system elapsed
## 10.22 0.34 3.81
#ranger_fit
ranger_fit$bestTune
## mtry
## 7 7
plot(ranger_fit)
rmse(predict(ranger_fit, boston_tst), boston_tst$medv)
## [1] 4.253865
Notice due to the differences in the implementation, the results are not the same as the original random forest. In general the results should be similar.
xgboost
The xgboost
package implements eXtreme Gradient Boosting, which is similar to the methods found in gbm.
Tuned well, often xgboost
can obtain excellent results, often winning Kaggle competitions. (In this example it beats gbm
, but not the random forest based methods.)
library(gbm)
library(xgboost)
set.seed(42)
gbm_fit = train(medv ~ ., data = boston_trn,
method = "gbm",
trControl = cv_5,
verbose = FALSE,
tuneLength = 10)
#gbm_fit
gbm_fit$bestTune
## n.trees interaction.depth shrinkage n.minobsinnode
## 71 50 8 0.1 10
plot(gbm_fit)
rmse(predict(gbm_fit, boston_tst), boston_tst$medv)
## [1] 4.488456
set.seed(42)
xgb_fit = train(medv ~ ., data = boston_trn,
method = "xgbTree",
trControl = cv_5,
verbose = FALSE,
tuneLength = 10)
#xgb_fit
xgb_fit$bestTune
## nrounds max_depth eta gamma colsample_bytree min_child_weight
## 373 150 9 0.4 0 0.8 1
plot(xgb_fit)
rmse(predict(xgb_fit, boston_tst), boston_tst$medv)
## [1] 4.003484