Goal
After this exercise, you should be able to define search spaces for learning algorithms and apply different hyperparameter (HP) optimization (HPO) techniques to search through the search space to find a well-performing hyperparameter configuration (HPC).
Exercises
Again, we are looking at the german_credit
data set and
corresponding task (you can quickly load the task with
tsk("german_credit")
). We want to train a k-NN model but
ask ourselves what the best choice of \(k\) might be? Furthermore, we are not sure
how to set other HPs of the learner, e.g., if we should scale the data
or not. In this exercise, we conduct HPO for k-NN to automatically find
a good HPC.
Recap: k-NN
k-NN is a machine learning method that predicts new data by averaging over the responses of the k nearest neighbors.Parameter spaces
Define a meaningful search space for the HPs k
and
scale
. You can checkout the help page
lrn("classif.kknn")$help()
for an overview of the k-NN
learner.
Hint 1
Each learner has a slotparam_set
that contains all HPs
that can be used for the tuning. In this use case we tune a learner with
the key "classif.kknn"
. The functions to define the search
space are ps
and p_int
, p_dbl
,
p_fct
, or p_lgl
for HPs in the search space.
Hyperparameter optimization
Now, we want to tune the k-NN model with the search space from the previous exercise. As resampling strategy we use a 3 fold cross validation. The tuning strategy should be a random search. As termination criteria we choose 40 evaluations.
Hint 1
The elements required for the tuning are:
- Task: German credit
- Algorithm: k-NN algorithm from
lrn()
- Resampling: 3-fold cross validation using
rsmp()
- Terminator: 40 evaluations using
trm()
- Search space: See previous exercise
- We use the default performance measure
(
msr("classif.ce")
for classification andmsr("classif.mse")
for regression)
ti()
. The
random search optimization algorithm is obtained from tnr()
with the corresponding key as argument. Furthermore, we allow parallel
computations and set the batch size as well as the number of cores to
four.
Hint 2
The optimization algorithm is obtained from tnr()
with
the corresponding key as argument. Furthermore we allow parallel
computations using four cores:
library(mlr3)
library(mlr3learners)
library(mlr3tuning)
future::plan("multicore", workers = 4L)
task = tsk(...)
lrn_knn = lrn(...)
search_space = ps(
k = p_int(1, 100),
scale = p_lgl()
)
resampling = rsmp(...)
terminator = trm(..., ... = 40L)
instance = ti(
task = ...,
learner = ...,
resampling = ...,
terminator = ...,
search_space = ...
)
optimizer = tnr(...)
optimizer$...(...)
$optimize()
method of the tuner.
Solution
Click me
library(mlr3)
library(mlr3learners)
library(mlr3tuning)
future::plan("multicore", workers = 4L)
task = tsk("german_credit")
lrn_knn = lrn("classif.kknn")
search_space = ps(
k = p_int(1, 100),
scale = p_lgl()
)
resampling = rsmp("cv", folds = 3L)
terminator = trm("evals", n_evals = 40L)
instance = ti(
task = task,
learner = lrn_knn,
resampling = resampling,
terminator = terminator,
search_space = search_space
)
optimizer = tnr("random_search", batch_size = 4L)
optimizer$optimize(instance)
## INFO [08:14:41.445] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerBatchRandomSearch>' and '<TerminatorEvals> [n_evals=40, k=0]'
## INFO [08:14:41.537] [bbotk] Evaluating 4 configuration(s)
## INFO [08:14:43.525] [bbotk] Result of batch 1:
## INFO [08:14:43.532] [bbotk] k scale classif.ce warnings errors runtime_learners uhash
## INFO [08:14:43.532] [bbotk] 15 TRUE 0.26901 0 0 0.29 11379ca7-a94b-4b67-afb4-125c30654aed
## INFO [08:14:43.532] [bbotk] 18 FALSE 0.32799 0 0 0.16 3d654a31-8f21-45c0-912f-7257a68792ca
## INFO [08:14:43.532] [bbotk] 34 FALSE 0.32201 0 0 0.21 818c6b0c-5f4e-4a81-babd-d56095d51c4c
## INFO [08:14:43.532] [bbotk] 19 FALSE 0.32800 0 0 0.17 f2747f6d-4127-4c3e-bda2-5d4f596a6cfd
## INFO [08:14:43.559] [bbotk] Evaluating 4 configuration(s)
## INFO [08:14:44.681] [bbotk] Result of batch 2:
## INFO [08:14:44.685] [bbotk] k scale classif.ce warnings errors runtime_learners uhash
## INFO [08:14:44.685] [bbotk] 2 FALSE 0.37899 0 0 0.15 459332a9-4a9a-4ec3-85f8-ab77adc33464
## INFO [08:14:44.685] [bbotk] 62 TRUE 0.28104 0 0 0.24 ebbbaafa-e674-4d1e-9f08-27b12d0230b3
## INFO [08:14:44.685] [bbotk] 47 FALSE 0.31502 0 0 0.17 37149511-3adb-4fb2-af88-f374c6504fb8
## INFO [08:14:44.685] [bbotk] 36 TRUE 0.26803 0 0 0.22 dcc80a8d-801b-4573-9e8d-5b8a48a0cf71
## INFO [08:14:44.698] [bbotk] Evaluating 4 configuration(s)
## INFO [08:14:46.109] [bbotk] Result of batch 3:
## INFO [08:14:46.114] [bbotk] k scale classif.ce warnings errors runtime_learners uhash
## INFO [08:14:46.114] [bbotk] 7 TRUE 0.29301 0 0 0.19 0e16e3d8-7490-4743-b7b0-2f79b023b697
## INFO [08:14:46.114] [bbotk] 24 TRUE 0.27103 0 0 0.20 a3305510-c493-4369-8d2e-78e74c7ed15d
## INFO [08:14:46.114] [bbotk] 100 TRUE 0.28703 0 0 0.25 1fa8d4a2-9e65-4c9c-960f-1c4dcf218de3
## INFO [08:14:46.114] [bbotk] 68 FALSE 0.30302 0 0 0.21 1aa86b2c-540b-4de4-8259-a6af1500aeff
## INFO [08:14:46.129] [bbotk] Evaluating 4 configuration(s)
## INFO [08:14:47.320] [bbotk] Result of batch 4:
## INFO [08:14:47.326] [bbotk] k scale classif.ce warnings errors runtime_learners uhash
## INFO [08:14:47.326] [bbotk] 6 TRUE 0.29301 0 0 0.22 d61cdb48-2a0e-44a7-80ab-45ed68c7c7c4
## INFO [08:14:47.326] [bbotk] 24 TRUE 0.27103 0 0 0.21 e9806185-db33-42e1-bc52-8a4c28af7730
## INFO [08:14:47.326] [bbotk] 24 TRUE 0.27103 0 0 0.21 901d0e92-7bd0-44aa-886c-b552fa7c7a70
## INFO [08:14:47.326] [bbotk] 24 FALSE 0.31702 0 0 0.18 9b5ffef4-94e4-4969-9d4a-80c1621f7b2d
## INFO [08:14:47.347] [bbotk] Evaluating 4 configuration(s)
## INFO [08:14:48.686] [bbotk] Result of batch 5:
## INFO [08:14:48.690] [bbotk] k scale classif.ce warnings errors runtime_learners uhash
## INFO [08:14:48.690] [bbotk] 9 FALSE 0.34299 0 0 0.14 5cdbd7fb-62de-4697-b46b-f26e71877460
## INFO [08:14:48.690] [bbotk] 95 TRUE 0.28603 0 0 0.35 c84e83ce-151f-4313-96a7-f76e32f92d46
## INFO [08:14:48.690] [bbotk] 73 TRUE 0.28003 0 0 0.22 7cf74963-9943-4e99-8baa-35623ede70a5
## INFO [08:14:48.690] [bbotk] 93 FALSE 0.30302 0 0 0.24 1f0ead63-c919-40fb-8e4b-3fc155aff8fb
## INFO [08:14:48.704] [bbotk] Evaluating 4 configuration(s)
## INFO [08:14:50.059] [bbotk] Result of batch 6:
## INFO [08:14:50.063] [bbotk] k scale classif.ce warnings errors runtime_learners uhash
## INFO [08:14:50.063] [bbotk] 78 TRUE 0.28304 0 0 0.27 c8ebe499-74a6-4bbb-82ef-9e1b83b61c70
## INFO [08:14:50.063] [bbotk] 34 FALSE 0.32201 0 0 0.19 60717672-45d4-46b3-b3aa-f8aadf1570d8
## INFO [08:14:50.063] [bbotk] 86 TRUE 0.28403 0 0 0.28 43e3c144-82c0-4cd1-85b9-e2c22454bab1
## INFO [08:14:50.063] [bbotk] 59 TRUE 0.28203 0 0 0.29 428eca44-23db-49c1-906f-64bddcac0f14
## INFO [08:14:50.077] [bbotk] Evaluating 4 configuration(s)
## INFO [08:14:51.460] [bbotk] Result of batch 7:
## INFO [08:14:51.465] [bbotk] k scale classif.ce warnings errors runtime_learners uhash
## INFO [08:14:51.465] [bbotk] 93 FALSE 0.30302 0 0 0.26 61a8310d-35cc-4f43-9d4c-08c3cf8d1a86
## INFO [08:14:51.465] [bbotk] 44 FALSE 0.31301 0 0 0.26 6cb53a5f-eef1-4d7e-ba0d-cfb005d903ef
## INFO [08:14:51.465] [bbotk] 87 TRUE 0.28403 0 0 0.25 17db077d-e69d-46db-9dab-5529a0145fa2
## INFO [08:14:51.465] [bbotk] 31 TRUE 0.26003 0 0 0.23 338bfe1f-4052-4611-8956-a5ee852541aa
## INFO [08:14:51.481] [bbotk] Evaluating 4 configuration(s)
## INFO [08:14:52.658] [bbotk] Result of batch 8:
## INFO [08:14:52.662] [bbotk] k scale classif.ce warnings errors runtime_learners uhash
## INFO [08:14:52.662] [bbotk] 66 TRUE 0.27803 0 0 0.26 207e1ef5-b321-4894-9602-2f45fe00c2f2
## INFO [08:14:52.662] [bbotk] 52 TRUE 0.28403 0 0 0.21 c5a7718a-c681-430f-bacc-e8aa9498a078
## INFO [08:14:52.662] [bbotk] 15 FALSE 0.32799 0 0 0.15 1658dcfa-ce9a-48eb-a3a5-f4846a097b89
## INFO [08:14:52.662] [bbotk] 62 FALSE 0.30102 0 0 0.20 f806dc91-bfc6-428e-b4cd-7cdb67e56d48
## INFO [08:14:52.678] [bbotk] Evaluating 4 configuration(s)
## INFO [08:14:53.875] [bbotk] Result of batch 9:
## INFO [08:14:53.879] [bbotk] k scale classif.ce warnings errors runtime_learners uhash
## INFO [08:14:53.879] [bbotk] 51 FALSE 0.31002 0 0 0.23 3832dce8-d650-4ca3-bb65-9643df401990
## INFO [08:14:53.879] [bbotk] 28 TRUE 0.25903 0 0 0.22 99ba52a9-0547-4d71-a2ca-c72233e54e43
## INFO [08:14:53.879] [bbotk] 60 FALSE 0.30102 0 0 0.24 45250c80-6e43-4a10-8a57-bff608646769
## INFO [08:14:53.879] [bbotk] 74 FALSE 0.30302 0 0 0.24 6d27f4ec-d990-4950-9228-9a25158aad91
## INFO [08:14:53.895] [bbotk] Evaluating 4 configuration(s)
## INFO [08:14:55.633] [bbotk] Result of batch 10:
## INFO [08:14:55.641] [bbotk] k scale classif.ce warnings errors runtime_learners uhash
## INFO [08:14:55.641] [bbotk] 84 FALSE 0.30302 0 0 0.25 b2de952b-e747-4538-827c-534bb8449794
## INFO [08:14:55.641] [bbotk] 12 TRUE 0.27801 0 0 0.21 31e3826a-0100-4910-b5f9-6dd80c825a7b
## INFO [08:14:55.641] [bbotk] 96 TRUE 0.28603 0 0 0.35 e1d5962d-2972-4114-a591-7c58db6519e4
## INFO [08:14:55.641] [bbotk] 75 FALSE 0.30302 0 0 0.57 cb4cf6b0-d9ec-4581-b8a7-d0217105b433
## INFO [08:14:55.686] [bbotk] Finished optimizing after 40 evaluation(s)
## INFO [08:14:55.689] [bbotk] Result:
## INFO [08:14:55.694] [bbotk] k scale learner_param_vals x_domain classif.ce
## INFO [08:14:55.694] [bbotk] <int> <lgcl> <list> <list> <num>
## INFO [08:14:55.694] [bbotk] 28 TRUE <list[2]> <list[2]> 0.25903
## k scale learner_param_vals x_domain classif.ce
## <int> <lgcl> <list> <list> <num>
## 1: 28 TRUE <list[2]> <list[2]> 0.25903
instance$result_y
## classif.ce
## 0.25903
instance$result
## k scale learner_param_vals x_domain classif.ce
## <int> <lgcl> <list> <list> <num>
## 1: 28 TRUE <list[2]> <list[2]> 0.25903
Syntactic sugar to define the HP space
mlr3
provides syntactic sugar to shorten the process of
search space definition. To do so, it is possible to directly specify
the HP range in the learner construction:
Analyzing the tuning archive
Inspect the archive of hyperparameters evaluated during the tuning
process with instance$archive
. Create a simple plot with
the goal of illustrating the association between the hyperparametere
k
and the estimated classification error.
Visualizing hyperparameters
To see how effective the tuning was, it is useful to look at the effect of the HPs on the performance. It also helps us to understand how important different HPs are. Therefore, access the archive of the tuning instance and visualize the effect.
Hint 1
Access thearchive
of the tuning instance to get all
information about the tuning. You can use all known plotting techniques
after transforming it to a data.table
.
Hint 2
Solution
Click me
arx = as.data.table(instance$archive)
library(ggplot2)
library(patchwork)
gg_k = ggplot(arx, aes(x = k, y = classif.ce)) + geom_point()
gg_scale = ggplot(arx, aes(x = scale, y = classif.ce, fill = scale)) + geom_boxplot()
gg_k + gg_scale & theme(legend.position = "bottom")
## ALTERNATIVE:
# The `mlr3viz` automatically creates plots for getting an idea of the
# effect of the HPs:
library(mlr3viz)
autoplot(instance)
k
and scale
seem to
have a big impact on the performance of the model.
Hyperparameter dependencies
When defining a hyperparameter search space via the ps()
function, we sometimes encounter nested search spaces, also called
hyperparameter dependencies. One example for this are SVMs. Here, the
hyperparameter degree
is only relevant if the
hyperparameter kernel
is set to "polynomial"
.
Therefore, we only have to consider different configurations for
degree
if we evaluate candidate configurations with
polynomial kernel. Construct a search space for a SVM with
hyperparameters kernel
(candidates should be
"polynomial"
and "radial"
) and
degree
(integer ranging from 1 to 3, but only for
polynomial kernels), and account for the dependency structure.
Hint 1
In thep_fct
, p_dbl
, … functions, we specify
this using the depends
argument, which takes a named
argument of the form <param> == value
or
<param> %in% <vector>
.
Solution:
Click me:
ps(
kernel = p_fct(c("polynomial", "radial")),
degree = p_int(1, 3, depends = (kernel == "polynomial"))
)
## <ParamSet(2)>
## Key: <id>
## id class lower upper nlevels default parents value
## <char> <char> <num> <num> <num> <list> <list> <list>
## 1: degree ParamInt 1 3 3 <NoDefault[0]> kernel [NULL]
## 2: kernel ParamFct NA NA 2 <NoDefault[0]> [NULL] [NULL]
Hyperparameter transformations
When tuning non-negative hyperparameters with a broad range, using a logarithmic scale can be more efficient. This approach works especially well if we want to test many small values, but also a few very large ones. By selecting values on a logarithmic scale and then exponentiating them, we ensure a concentrated exploration of smaller values while still considering the possibility of very large values, allowing for a targeted and efficient search in finding optimal hyperparameter configurations.
A simple way to do this is to pass logscale = TRUE
when
using to_tune()
to define the parameter search space while
constructing the learner:
lrn = lrn("classif.svm", cost = to_tune(1e-5, 1e5, logscale = TRUE))
lrn$param_set$search_space()
## <ParamSet(1)>
## id class lower upper nlevels default value
## <char> <char> <num> <num> <num> <list> <list>
## 1: cost ParamDbl -11.513 11.513 Inf <NoDefault[0]> [NULL]
## Trafo is set.
To manually create the same transformation, we can pass the
transformation to the more general trafo
argument in
p_dbl()
and related functions and set the bounds using the
log()
function. For the following search space, implement a
logarithmic transformation. the output should look exactly as the search
space above.
Solution:
Click me:
search_space = ps(cost = p_dbl(log(1e-5), log(1e5),
trafo = function(x) exp(x))) # alternatively: 'trafo = exp'
search_space
## <ParamSet(1)>
## id class lower upper nlevels default value
## <char> <char> <num> <num> <num> <list> <list>
## 1: cost ParamDbl -11.513 11.513 Inf <NoDefault[0]> [NULL]
## Trafo is set.
Summary
- In this use-case we learned how to define search spaces for learner HPs.
- Based on this search space, we defined a tuning strategy to try a number of random configurations.
- We visualized the tested configurations to get an idea how the HP effect the performance of our learner.
- We learned about scale transformations in tuning.
- Finally we added a transformation to favor a certain range in the parameter space.
Further information
Other (more advanced) tuning algorithms:
Simuated annealing
: Random HPC are sampled and accepted based on an acceptance probability function which states how likely an improvement in performance is. The method is implemented intnr("gensa")
.Model-based optimization (MBO)
: Guess the most promising HPC by estimating the expected improvement of new points. Available inmlr3mbo
.Multifidelity optimization/Successive halving algorithm
: This technique starts with multiple HPC and throws away unpromising candidates. This is repeated several times to efficiently use the tuning budget. The method is implemented inmlr3hyperband
.