Machine learning projects will commonly require a user to “tune” a model’s hyperparameters to find a good balance between bias and variance. Several tools are available in a data scientist’s toolbox to handle this task, the most blunt of which is a grid search. A grid search gauges the model performance over a pre-defined set of hyperparameters without regard for past performance. As models increase in complexity and training time, grid searches become unwieldly.
Idealy, we would use the information from prior model evaluations to
guide us in our future parameter searches. This is precisely the idea
behind Bayesian Optimization, in which our prior response distribution
is iteratively updated based on our best guess of where the best
parameters are. The ParBayesianOptimization
package does
exactly this in the following process:
In this example, we will be using the agaricus.train dataset provided in the XGBoost package. Here, we load the packages, data, and create a folds object to be used in the scoring function.
library("xgboost")
library("ParBayesianOptimization")
data(agaricus.train, package = "xgboost")
Folds <- list(
Fold1 = as.integer(seq(1,nrow(agaricus.train$data),by = 3))
, Fold2 = as.integer(seq(2,nrow(agaricus.train$data),by = 3))
, Fold3 = as.integer(seq(3,nrow(agaricus.train$data),by = 3))
)
Now we need to define the scoring function. This function should, at
a minimum, return a list with a Score
element, which is the
model evaluation metric we want to maximize. We can also retain other
pieces of information created by the scoring function by including them
as named elements of the returned list. In this case, we want to retain
the optimal number of rounds determined by the xgb.cv
:
scoringFunction <- function(max_depth, min_child_weight, subsample) {
dtrain <- xgb.DMatrix(agaricus.train$data,label = agaricus.train$label)
Pars <- list(
booster = "gbtree"
, eta = 0.01
, max_depth = max_depth
, min_child_weight = min_child_weight
, subsample = subsample
, objective = "binary:logistic"
, eval_metric = "auc"
)
xgbcv <- xgb.cv(
params = Pars
, data = dtrain
, nround = 100
, folds = Folds
, prediction = TRUE
, showsd = TRUE
, early_stopping_rounds = 5
, maximize = TRUE
, verbose = 0)
return(
list(
Score = max(xgbcv$evaluation_log$test_auc_mean)
, nrounds = xgbcv$best_iteration
)
)
}
Some other objects we need to define are the bounds, GP kernel and acquisition function. In this example, the kernel and acquisition function are left as the default.
bounds
will tell our process its search space.GauPro
function
GauPro_kernel_model
and defines the covariance
function.We are now ready to put this all into the bayesOpt
function.
set.seed(1234)
optObj <- bayesOpt(
FUN = scoringFunction
, bounds = bounds
, initPoints = 4
, iters.n = 3
)
#>
#> Running initial scoring function 4 times in 1 thread(s)...
The console informs us that the process initialized by running
scoringFunction
4 times. It then fit a Gaussian process to
the parameter-score pairs, found the global optimum of the acquisition
function, and ran scoringFunction
again. This process
continued until we had 7 parameter-score pairs. You can interrogate the
bayesOpt
object to see the results:
optObj$scoreSummary
#> Epoch Iteration max_depth min_child_weight subsample gpUtility acqOptimum inBounds Elapsed Score nrounds errorMessage
#> <num> <int> <num> <num> <num> <num> <lgcl> <lgcl> <num> <num> <int> <lgcl>
#> 1: 0 1 9 5.863591 0.2585819 NA FALSE TRUE 0.116 0.9984374 11 NA
#> 2: 0 2 4 10.154185 0.5230172 NA FALSE TRUE 0.098 0.9977909 7 NA
#> 3: 0 3 6 24.487949 0.8622225 NA FALSE TRUE 0.382 0.9988232 52 NA
#> 4: 0 4 2 17.988070 0.6821260 NA FALSE TRUE 0.083 0.9876198 10 NA
#> 5: 1 5 2 7.652234 1.0000000 0.8147873 TRUE TRUE 0.069 0.9871588 8 NA
#> 6: 2 6 9 7.992080 0.2843361 0.7111546 TRUE TRUE 0.092 0.9977846 7 NA
#> 7: 3 7 9 1.000000 0.2500000 0.8122487 TRUE TRUE 0.101 0.9999503 9 NA