Goal
You will learn how to estimate the model performance with
mlr3
using resampling techniques such as 5-fold
cross-validation. Additionally, you will compare k-NN model against a
logistic regression model.
German Credit Data
We work with the German credit data. You can either manually create
the corresponding mlr3
task as we did before or use a
pre-defined task which is already included in the mlr3
package (you can look at the output of
as.data.table(mlr_tasks)
to see which other pre-defined
tasks that can be used to play around are included in the
mlr3
package).
library(mlr3verse)
task = tsk("german_credit")
task
## <TaskClassif:german_credit> (1000 x 21): German Credit
## * Target: credit_risk
## * Properties: twoclass
## * Features (20):
## - fct (14): credit_history, employment_duration, foreign_worker, housing, job, other_debtors,
## other_installment_plans, people_liable, personal_status_sex, property, purpose, savings,
## status, telephone
## - int (3): age, amount, duration
## - ord (3): installment_rate, number_credits, present_residence
task$positive # (check the positive class)
## [1] "good"
Exercise: Fairly evaluate the performance of two learners
We first create two mlr3
learners, a logistic regression
and a KNN learner. We then compare their performance via resampling.
Create the learners
Create a logistic regression learner (store it as an R object called
log_reg
) and KNN learner with \(k
= 5\) (store it as an R object called knn
).
Show Hint 1:
Checkas.data.table(mlr_learners)
to find the appropriate
learner.
Show Hint 2:
Make sure to have thekknn
package installed.
Set up a resampling instance
Use the mlr3
to set up a resampling instance and store
it as an R object called cv5
. Here, we aim for 5-fold
cross-validation. A table of possible resampling techniques implemented
in mlr3
can be shown by looking at
as.data.table(mlr_resamplings)
.
Show Hint 1:
Look at the table returned byas.data.table(mlr_resamplings)
and use the
rsmp
function to set up a 5-fold cross-validation instance.
Store the result of the rsmp
function in an R object called
cv5
.
Show Hint 2:
rsmp("cv")
by default sets up a 10-fold cross-validation
instance. The number of folds can be set using an additional argument
(see the params
column from
as.data.table(mlr_resamplings)
).
Run the resampling
After having created a resampling instance, use it to apply the chosen resampling technique to both previously created learners.
Show Hint 1:
You need to supply the task, the learner and the previously created resampling instance as arguments to theresample
function.
See ?resample
for further details and examples.
Show Hint 2:
The key ingredients for resample()
are a task (e.g.,
created by as_task_classif()
or tsk()
), a
learner (created by lrn()
) and a resampling strategy
(created by rsmp()
), e.g.,
resample(task = task, learner = log_reg, resampling = cv5)
Solution
Click me:
res_log_reg = resample(task, log_reg, cv5)
res_knn = resample(task, knn, cv5)
res_log_reg
## <ResampleResult> with 5 resampling iterations
## task_id learner_id resampling_id iteration warnings errors
## german_credit classif.log_reg cv 1 0 0
## german_credit classif.log_reg cv 2 0 0
## german_credit classif.log_reg cv 3 0 0
## german_credit classif.log_reg cv 4 0 0
## german_credit classif.log_reg cv 5 0 0
res_knn
## <ResampleResult> with 5 resampling iterations
## task_id learner_id resampling_id iteration warnings errors
## german_credit classif.kknn cv 1 0 0
## german_credit classif.kknn cv 2 0 0
## german_credit classif.kknn cv 3 0 0
## german_credit classif.kknn cv 4 0 0
## german_credit classif.kknn cv 5 0 0
Evaluation
Compute the cross-validated classification accuracy of both models. Which learner performed better?
Show Hint 1:
Usemsr("classif.acc")
and the aggregate
method of the resampling object.
Show Hint 2:
res_knn$aggregate(msr(...))
to obtain the classification
accuracy averaged across all folds.
Summary
We can now apply different resampling methods to estimate the performance of different learners and fairly compare them. We now have learnt how to obtain a better (in terms of variance) estimate of our model performance instead of doing a simple train and test split. This enables us to fairly compare different learners.