Supervised Learning I

Resampling with mlr3


You will learn how to estimate the model performance with mlr3 using resampling techniques such as 5-fold cross-validation. Additionally, you will compare k-NN model against a logistic regression model.

German Credit Data

We work with the German credit data. You can either manually create the corresponding mlr3 task as we did before or use a pre-defined task which is already included in the mlr3 package (you can look at the output of to see which other pre-defined tasks that can be used to play around are included in the mlr3 package).

task = tsk("german_credit")
## <TaskClassif:german_credit> (1000 x 21): German Credit
## * Target: credit_risk
## * Properties: twoclass
## * Features (20):
##   - fct (14): credit_history, employment_duration, foreign_worker, housing, job, other_debtors,
##     other_installment_plans, people_liable, personal_status_sex, property, purpose, savings,
##     status, telephone
##   - int (3): age, amount, duration
##   - ord (3): installment_rate, number_credits, present_residence
task$positive # (check the positive class)
## [1] "good"

Exercise: Fairly evaluate the performance of two learners

We first create two mlr3 learners, a logistic regression and a KNN learner. We then compare their performance via resampling.

Create the learners

Create a logistic regression learner (store it as an R object called log_reg) and KNN learner with \(k = 5\) (store it as an R object called knn).

Show Hint 1: Check to find the appropriate learner.
Show Hint 2: Make sure to have the kknn package installed.


Click me:
log_reg = lrn("classif.log_reg")
knn = lrn("classif.kknn", k = 5)

Set up a resampling instance

Use the mlr3 to set up a resampling instance and store it as an R object called cv5. Here, we aim for 5-fold cross-validation. A table of possible resampling techniques implemented in mlr3 can be shown by looking at

Show Hint 1: Look at the table returned by and use the rsmp function to set up a 5-fold cross-validation instance. Store the result of the rsmp function in an R object called cv5.
Show Hint 2: rsmp("cv") by default sets up a 10-fold cross-validation instance. The number of folds can be set using an additional argument (see the params column from


Click me:
cv5 = rsmp("cv", folds = 5)
## <ResamplingCV>: Cross-Validation
## * Iterations: 5
## * Instantiated: FALSE
## * Parameters: folds=5

Note: Instantiated: FALSE means that we only created the resampling instance and did not apply the resampling technique to a task yet.

Run the resampling

After having created a resampling instance, use it to apply the chosen resampling technique to both previously created learners.

Show Hint 1: You need to supply the task, the learner and the previously created resampling instance as arguments to the resample function. See ?resample for further details and examples.
Show Hint 2:

The key ingredients for resample() are a task (e.g., created by as_task_classif() or tsk()), a learner (created by lrn()) and a resampling strategy (created by rsmp()), e.g.,

resample(task = task, learner = log_reg, resampling = cv5)


Click me:
res_log_reg = resample(task, log_reg, cv5)
res_knn = resample(task, knn, cv5)
## <ResampleResult> with 5 resampling iterations
##        task_id      learner_id resampling_id iteration warnings errors
##  german_credit classif.log_reg            cv         1        0      0
##  german_credit classif.log_reg            cv         2        0      0
##  german_credit classif.log_reg            cv         3        0      0
##  german_credit classif.log_reg            cv         4        0      0
##  german_credit classif.log_reg            cv         5        0      0
## <ResampleResult> with 5 resampling iterations
##        task_id   learner_id resampling_id iteration warnings errors
##  german_credit classif.kknn            cv         1        0      0
##  german_credit classif.kknn            cv         2        0      0
##  german_credit classif.kknn            cv         3        0      0
##  german_credit classif.kknn            cv         4        0      0
##  german_credit classif.kknn            cv         5        0      0


Compute the cross-validated classification accuracy of both models. Which learner performed better?

Show Hint 1: Use msr("classif.acc") and the aggregate method of the resampling object.
Show Hint 2: res_knn$aggregate(msr(...)) to obtain the classification accuracy averaged across all folds.


Click me:
## classif.acc 
##        0.72
## classif.acc 
##       0.747

Note: Use e.g. res_knn$score(msr(...)) to look at the results of each individual fold.


We can now apply different resampling methods to estimate the performance of different learners and fairly compare them. We now have learnt how to obtain a better (in terms of variance) estimate of our model performance instead of doing a simple train and test split. This enables us to fairly compare different learners.