Supervised Learning II

Hyperparameter Tuning

Goal

After this exercise, you should be able to define search spaces for learning algorithms and apply different hyperparameter (HP) optimization (HPO) techniques to search through the search space to find a well-performing hyperparameter configuration (HPC).

Exercises

Again, we are looking at the german_credit data set and corresponding task (you can quickly load the task with tsk("german_credit")). We want to train a k-NN model but ask ourselves what the best choice of \(k\) might be? Furthermore, we are not sure how to set other HPs of the learner, e.g., if we should scale the data or not. In this exercise, we conduct HPO for k-NN to automatically find a good HPC.

library(mlr3verse)
task = tsk("german_credit")
Recap: k-NN k-NN is a machine learning method that predicts new data by averaging over the responses of the k nearest neighbors.

Parameter spaces

Define a meaningful search space for the HPs k and scale. You can checkout the help page lrn("classif.kknn")$help() for an overview of the k-NN learner.

Hint 1 Each learner has a slot param_set that contains all HPs that can be used for the tuning. In this use case we tune a learner with the key "classif.kknn". The functions to define the search space are ps and p_int, p_dbl, p_fct, or p_lgl for HPs in the search space.
Hint 2
library(mlr3tuning)

search_space = ps(
  k = p_int(...),
  scale = ...
)

Solution

Click me
library(mlr3tuning)

search_space = ps(
  k = p_int(1, 100),
  scale = p_lgl()
)

Hyperparameter optimization

Now, we want to tune the k-NN model with the search space from the previous exercise. As resampling strategy we use a 3 fold cross validation. The tuning strategy should be a random search. As termination criteria we choose 40 evaluations.

Hint 1

The elements required for the tuning are:

  • Task: German credit
  • Algorithm: k-NN algorithm from lrn()
  • Resampling: 3-fold cross validation using rsmp()
  • Terminator: 40 evaluations using trm()
  • Search space: See previous exercise
  • We use the default performance measure (msr("classif.ce") for classification and msr("classif.mse") for regression)
The tuning instance is then defined by calling ti(). The random search optimization algorithm is obtained from tnr() with the corresponding key as argument. Furthermore, we allow parallel computations and set the batch size as well as the number of cores to four.
Hint 2

The optimization algorithm is obtained from tnr() with the corresponding key as argument. Furthermore we allow parallel computations using four cores:

library(mlr3)
library(mlr3learners)
library(mlr3tuning)

future::plan("multicore", workers = 4L)

task = tsk(...)
lrn_knn = lrn(...)

search_space = ps(
  k = p_int(1, 100),
  scale = p_lgl()
)
resampling = rsmp(...)

terminator = trm(..., ... = 40L)

instance = ti(
  task = ...,
  learner = ...,
  resampling = ...,
  terminator = ...,
  search_space = ...
)

optimizer = tnr(...)
optimizer$...(...)
Finally, the optimization is started by passing the tuning instance to the $optimize() method of the tuner.

Solution

Click me
library(mlr3)
library(mlr3learners)
library(mlr3tuning)

future::plan("multicore", workers = 4L)

task = tsk("german_credit")
lrn_knn = lrn("classif.kknn")

search_space = ps(
  k = p_int(1, 100),
  scale = p_lgl()
)
resampling = rsmp("cv", folds = 3L)

terminator = trm("evals", n_evals = 40L)

instance = ti(
  task = task,
  learner = lrn_knn,
  resampling = resampling,
  terminator = terminator,
  search_space = search_space
)

optimizer = tnr("random_search", batch_size = 4L)

optimizer$optimize(instance)
## INFO  [08:14:41.445] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerBatchRandomSearch>' and '<TerminatorEvals> [n_evals=40, k=0]'
## INFO  [08:14:41.537] [bbotk] Evaluating 4 configuration(s)
## INFO  [08:14:43.525] [bbotk] Result of batch 1:
## INFO  [08:14:43.532] [bbotk]   k scale classif.ce warnings errors runtime_learners                                uhash
## INFO  [08:14:43.532] [bbotk]  15  TRUE    0.26901        0      0             0.29 11379ca7-a94b-4b67-afb4-125c30654aed
## INFO  [08:14:43.532] [bbotk]  18 FALSE    0.32799        0      0             0.16 3d654a31-8f21-45c0-912f-7257a68792ca
## INFO  [08:14:43.532] [bbotk]  34 FALSE    0.32201        0      0             0.21 818c6b0c-5f4e-4a81-babd-d56095d51c4c
## INFO  [08:14:43.532] [bbotk]  19 FALSE    0.32800        0      0             0.17 f2747f6d-4127-4c3e-bda2-5d4f596a6cfd
## INFO  [08:14:43.559] [bbotk] Evaluating 4 configuration(s)
## INFO  [08:14:44.681] [bbotk] Result of batch 2:
## INFO  [08:14:44.685] [bbotk]   k scale classif.ce warnings errors runtime_learners                                uhash
## INFO  [08:14:44.685] [bbotk]   2 FALSE    0.37899        0      0             0.15 459332a9-4a9a-4ec3-85f8-ab77adc33464
## INFO  [08:14:44.685] [bbotk]  62  TRUE    0.28104        0      0             0.24 ebbbaafa-e674-4d1e-9f08-27b12d0230b3
## INFO  [08:14:44.685] [bbotk]  47 FALSE    0.31502        0      0             0.17 37149511-3adb-4fb2-af88-f374c6504fb8
## INFO  [08:14:44.685] [bbotk]  36  TRUE    0.26803        0      0             0.22 dcc80a8d-801b-4573-9e8d-5b8a48a0cf71
## INFO  [08:14:44.698] [bbotk] Evaluating 4 configuration(s)
## INFO  [08:14:46.109] [bbotk] Result of batch 3:
## INFO  [08:14:46.114] [bbotk]    k scale classif.ce warnings errors runtime_learners                                uhash
## INFO  [08:14:46.114] [bbotk]    7  TRUE    0.29301        0      0             0.19 0e16e3d8-7490-4743-b7b0-2f79b023b697
## INFO  [08:14:46.114] [bbotk]   24  TRUE    0.27103        0      0             0.20 a3305510-c493-4369-8d2e-78e74c7ed15d
## INFO  [08:14:46.114] [bbotk]  100  TRUE    0.28703        0      0             0.25 1fa8d4a2-9e65-4c9c-960f-1c4dcf218de3
## INFO  [08:14:46.114] [bbotk]   68 FALSE    0.30302        0      0             0.21 1aa86b2c-540b-4de4-8259-a6af1500aeff
## INFO  [08:14:46.129] [bbotk] Evaluating 4 configuration(s)
## INFO  [08:14:47.320] [bbotk] Result of batch 4:
## INFO  [08:14:47.326] [bbotk]   k scale classif.ce warnings errors runtime_learners                                uhash
## INFO  [08:14:47.326] [bbotk]   6  TRUE    0.29301        0      0             0.22 d61cdb48-2a0e-44a7-80ab-45ed68c7c7c4
## INFO  [08:14:47.326] [bbotk]  24  TRUE    0.27103        0      0             0.21 e9806185-db33-42e1-bc52-8a4c28af7730
## INFO  [08:14:47.326] [bbotk]  24  TRUE    0.27103        0      0             0.21 901d0e92-7bd0-44aa-886c-b552fa7c7a70
## INFO  [08:14:47.326] [bbotk]  24 FALSE    0.31702        0      0             0.18 9b5ffef4-94e4-4969-9d4a-80c1621f7b2d
## INFO  [08:14:47.347] [bbotk] Evaluating 4 configuration(s)
## INFO  [08:14:48.686] [bbotk] Result of batch 5:
## INFO  [08:14:48.690] [bbotk]   k scale classif.ce warnings errors runtime_learners                                uhash
## INFO  [08:14:48.690] [bbotk]   9 FALSE    0.34299        0      0             0.14 5cdbd7fb-62de-4697-b46b-f26e71877460
## INFO  [08:14:48.690] [bbotk]  95  TRUE    0.28603        0      0             0.35 c84e83ce-151f-4313-96a7-f76e32f92d46
## INFO  [08:14:48.690] [bbotk]  73  TRUE    0.28003        0      0             0.22 7cf74963-9943-4e99-8baa-35623ede70a5
## INFO  [08:14:48.690] [bbotk]  93 FALSE    0.30302        0      0             0.24 1f0ead63-c919-40fb-8e4b-3fc155aff8fb
## INFO  [08:14:48.704] [bbotk] Evaluating 4 configuration(s)
## INFO  [08:14:50.059] [bbotk] Result of batch 6:
## INFO  [08:14:50.063] [bbotk]   k scale classif.ce warnings errors runtime_learners                                uhash
## INFO  [08:14:50.063] [bbotk]  78  TRUE    0.28304        0      0             0.27 c8ebe499-74a6-4bbb-82ef-9e1b83b61c70
## INFO  [08:14:50.063] [bbotk]  34 FALSE    0.32201        0      0             0.19 60717672-45d4-46b3-b3aa-f8aadf1570d8
## INFO  [08:14:50.063] [bbotk]  86  TRUE    0.28403        0      0             0.28 43e3c144-82c0-4cd1-85b9-e2c22454bab1
## INFO  [08:14:50.063] [bbotk]  59  TRUE    0.28203        0      0             0.29 428eca44-23db-49c1-906f-64bddcac0f14
## INFO  [08:14:50.077] [bbotk] Evaluating 4 configuration(s)
## INFO  [08:14:51.460] [bbotk] Result of batch 7:
## INFO  [08:14:51.465] [bbotk]   k scale classif.ce warnings errors runtime_learners                                uhash
## INFO  [08:14:51.465] [bbotk]  93 FALSE    0.30302        0      0             0.26 61a8310d-35cc-4f43-9d4c-08c3cf8d1a86
## INFO  [08:14:51.465] [bbotk]  44 FALSE    0.31301        0      0             0.26 6cb53a5f-eef1-4d7e-ba0d-cfb005d903ef
## INFO  [08:14:51.465] [bbotk]  87  TRUE    0.28403        0      0             0.25 17db077d-e69d-46db-9dab-5529a0145fa2
## INFO  [08:14:51.465] [bbotk]  31  TRUE    0.26003        0      0             0.23 338bfe1f-4052-4611-8956-a5ee852541aa
## INFO  [08:14:51.481] [bbotk] Evaluating 4 configuration(s)
## INFO  [08:14:52.658] [bbotk] Result of batch 8:
## INFO  [08:14:52.662] [bbotk]   k scale classif.ce warnings errors runtime_learners                                uhash
## INFO  [08:14:52.662] [bbotk]  66  TRUE    0.27803        0      0             0.26 207e1ef5-b321-4894-9602-2f45fe00c2f2
## INFO  [08:14:52.662] [bbotk]  52  TRUE    0.28403        0      0             0.21 c5a7718a-c681-430f-bacc-e8aa9498a078
## INFO  [08:14:52.662] [bbotk]  15 FALSE    0.32799        0      0             0.15 1658dcfa-ce9a-48eb-a3a5-f4846a097b89
## INFO  [08:14:52.662] [bbotk]  62 FALSE    0.30102        0      0             0.20 f806dc91-bfc6-428e-b4cd-7cdb67e56d48
## INFO  [08:14:52.678] [bbotk] Evaluating 4 configuration(s)
## INFO  [08:14:53.875] [bbotk] Result of batch 9:
## INFO  [08:14:53.879] [bbotk]   k scale classif.ce warnings errors runtime_learners                                uhash
## INFO  [08:14:53.879] [bbotk]  51 FALSE    0.31002        0      0             0.23 3832dce8-d650-4ca3-bb65-9643df401990
## INFO  [08:14:53.879] [bbotk]  28  TRUE    0.25903        0      0             0.22 99ba52a9-0547-4d71-a2ca-c72233e54e43
## INFO  [08:14:53.879] [bbotk]  60 FALSE    0.30102        0      0             0.24 45250c80-6e43-4a10-8a57-bff608646769
## INFO  [08:14:53.879] [bbotk]  74 FALSE    0.30302        0      0             0.24 6d27f4ec-d990-4950-9228-9a25158aad91
## INFO  [08:14:53.895] [bbotk] Evaluating 4 configuration(s)
## INFO  [08:14:55.633] [bbotk] Result of batch 10:
## INFO  [08:14:55.641] [bbotk]   k scale classif.ce warnings errors runtime_learners                                uhash
## INFO  [08:14:55.641] [bbotk]  84 FALSE    0.30302        0      0             0.25 b2de952b-e747-4538-827c-534bb8449794
## INFO  [08:14:55.641] [bbotk]  12  TRUE    0.27801        0      0             0.21 31e3826a-0100-4910-b5f9-6dd80c825a7b
## INFO  [08:14:55.641] [bbotk]  96  TRUE    0.28603        0      0             0.35 e1d5962d-2972-4114-a591-7c58db6519e4
## INFO  [08:14:55.641] [bbotk]  75 FALSE    0.30302        0      0             0.57 cb4cf6b0-d9ec-4581-b8a7-d0217105b433
## INFO  [08:14:55.686] [bbotk] Finished optimizing after 40 evaluation(s)
## INFO  [08:14:55.689] [bbotk] Result:
## INFO  [08:14:55.694] [bbotk]      k  scale learner_param_vals  x_domain classif.ce
## INFO  [08:14:55.694] [bbotk]  <int> <lgcl>             <list>    <list>      <num>
## INFO  [08:14:55.694] [bbotk]     28   TRUE          <list[2]> <list[2]>    0.25903
##        k  scale learner_param_vals  x_domain classif.ce
##    <int> <lgcl>             <list>    <list>      <num>
## 1:    28   TRUE          <list[2]> <list[2]>    0.25903
instance$result_y
## classif.ce 
##    0.25903
instance$result
##        k  scale learner_param_vals  x_domain classif.ce
##    <int> <lgcl>             <list>    <list>      <num>
## 1:    28   TRUE          <list[2]> <list[2]>    0.25903

Syntactic sugar to define the HP space

mlr3 provides syntactic sugar to shorten the process of search space definition. To do so, it is possible to directly specify the HP range in the learner construction:

Click me
task = tsk("german_credit")

lrn_knn = lrn("classif.kknn", k = to_tune(1, 100), scale = to_tune())
This adjust the parameter set (lrn_knn$param_set) attached to the learner and flags it as “tunable”.

Analyzing the tuning archive

Inspect the archive of hyperparameters evaluated during the tuning process with instance$archive. Create a simple plot with the goal of illustrating the association between the hyperparametere k and the estimated classification error.

Solution:

Click me:
plot(x = instance$archive$data$k, y = instance$archive$data$classif.ce)

Visualizing hyperparameters

To see how effective the tuning was, it is useful to look at the effect of the HPs on the performance. It also helps us to understand how important different HPs are. Therefore, access the archive of the tuning instance and visualize the effect.

Hint 1 Access the archive of the tuning instance to get all information about the tuning. You can use all known plotting techniques after transforming it to a data.table.
Hint 2
arx = as...(instance$...)

library(ggplot2)
library(patchwork)

gg_k = ggplot(..., aes(...)) + ...()
gg_scale = ggplot(..., aes(...)) + ...()

gg_k + gg_scale & theme(legend.position = "bottom")

Solution

Click me
arx = as.data.table(instance$archive)

library(ggplot2)
library(patchwork)

gg_k = ggplot(arx, aes(x = k, y = classif.ce)) + geom_point()
gg_scale = ggplot(arx, aes(x = scale, y = classif.ce, fill = scale)) + geom_boxplot()

gg_k + gg_scale & theme(legend.position = "bottom")


## ALTERNATIVE:

# The `mlr3viz` automatically creates plots for getting an idea of the
# effect of the HPs:

library(mlr3viz)

autoplot(instance)

The number of neighbours k and scale seem to have a big impact on the performance of the model.

Hyperparameter dependencies

When defining a hyperparameter search space via the ps() function, we sometimes encounter nested search spaces, also called hyperparameter dependencies. One example for this are SVMs. Here, the hyperparameter degree is only relevant if the hyperparameter kernel is set to "polynomial". Therefore, we only have to consider different configurations for degree if we evaluate candidate configurations with polynomial kernel. Construct a search space for a SVM with hyperparameters kernel (candidates should be "polynomial" and "radial") and degree (integer ranging from 1 to 3, but only for polynomial kernels), and account for the dependency structure.

Hint 1 In the p_fct, p_dbl, … functions, we specify this using the depends argument, which takes a named argument of the form <param> == value or <param> %in% <vector>.

Solution:

Click me:
ps(
  kernel = p_fct(c("polynomial", "radial")),
  degree = p_int(1, 3, depends = (kernel == "polynomial"))
)
## <ParamSet(2)>
## Key: <id>
##        id    class lower upper nlevels        default parents  value
##    <char>   <char> <num> <num>   <num>         <list>  <list> <list>
## 1: degree ParamInt     1     3       3 <NoDefault[0]>  kernel [NULL]
## 2: kernel ParamFct    NA    NA       2 <NoDefault[0]>  [NULL] [NULL]

Hyperparameter transformations

When tuning non-negative hyperparameters with a broad range, using a logarithmic scale can be more efficient. This approach works especially well if we want to test many small values, but also a few very large ones. By selecting values on a logarithmic scale and then exponentiating them, we ensure a concentrated exploration of smaller values while still considering the possibility of very large values, allowing for a targeted and efficient search in finding optimal hyperparameter configurations.

A simple way to do this is to pass logscale = TRUE when using to_tune() to define the parameter search space while constructing the learner:

lrn = lrn("classif.svm", cost = to_tune(1e-5, 1e5, logscale = TRUE))
lrn$param_set$search_space()
## <ParamSet(1)>
##        id    class   lower  upper nlevels        default  value
##    <char>   <char>   <num>  <num>   <num>         <list> <list>
## 1:   cost ParamDbl -11.513 11.513     Inf <NoDefault[0]> [NULL]
## Trafo is set.

To manually create the same transformation, we can pass the transformation to the more general trafo argument in p_dbl() and related functions and set the bounds using the log() function. For the following search space, implement a logarithmic transformation. the output should look exactly as the search space above.

# Change this to a log trafo:
ps(cost = p_dbl(1e-5, 1e5))

Solution:

Click me:
search_space = ps(cost = p_dbl(log(1e-5), log(1e5),
  trafo = function(x) exp(x))) # alternatively: 'trafo = exp'
search_space
## <ParamSet(1)>
##        id    class   lower  upper nlevels        default  value
##    <char>   <char>   <num>  <num>   <num>         <list> <list>
## 1:   cost ParamDbl -11.513 11.513     Inf <NoDefault[0]> [NULL]
## Trafo is set.

Summary

  • In this use-case we learned how to define search spaces for learner HPs.
  • Based on this search space, we defined a tuning strategy to try a number of random configurations.
  • We visualized the tested configurations to get an idea how the HP effect the performance of our learner.
  • We learned about scale transformations in tuning.
  • Finally we added a transformation to favor a certain range in the parameter space.

Further information

Other (more advanced) tuning algorithms:

  • Simuated annealing: Random HPC are sampled and accepted based on an acceptance probability function which states how likely an improvement in performance is. The method is implemented in tnr("gensa").
  • Model-based optimization (MBO): Guess the most promising HPC by estimating the expected improvement of new points. Available in mlr3mbo.
  • Multifidelity optimization/Successive halving algorithm: This technique starts with multiple HPC and throws away unpromising candidates. This is repeated several times to efficiently use the tuning budget. The method is implemented in mlr3hyperband.