library(mlr3verse) # All the mlr3 things
library(effectplots) # For effects plotting
::get_logger("mlr3")$set_threshold("error")
lgr
# Penguin Task setup
<- na.omit(palmerpenguins::penguins)
penguins <- as_task_classif(
penguin_task
penguins,target = "species"
)
Feature Importance
Goals of this part:
- Introduce (some) variable importance measures
- Compare measures for different settings
1 Feature Importance
Feature importance falls under the umbrella of interpretability, which is a huge topic, and there’s a lot to explore — we’ll cover some basics and if you’re interested, you can always find more at
- The IML lecture (free slides, videos)
- The Interpretable Machine Learning book
- The
{mlr3}
book chapter
Before we get started with the general methods, it should be noted that some learners bring their own method-specific importance measures. Random Forests (via {ranger}
) for example has some built-in importance metrics, like the corrected Gini impurity:
<- lrn("classif.ranger", importance = "impurity_corrected")
lrn_ranger $train(penguin_task)
lrn_ranger
sort(lrn_ranger$importance(), decreasing = TRUE)
#> bill_length_mm flipper_length_mm bill_depth_mm island
#> 54.2401385 36.3183500 26.7463551 22.5503396
#> body_mass_g sex year
#> 21.5675075 0.8636880 -0.5303487
Which shows us that for our penguins, bill_length_mm
is probably the most relevant feature, whereas body_mass_g
does not turn out to be as important with regard to species classification. THe year
doesn’t seem to be useful for prediction at all — which is perfectly plausible!
1.1 Feature Importance with {mlr3filters}
The mlr3filters
package provides some global, marginal importance methods, meaning they consider the relationship between the target and one feature at a time.
as.data.table(mlr_filters)[1:20, .(key, label)]
#> Key: <key>
#> key label
#> <char> <char>
#> 1: anova ANOVA F-Test
#> 2: auc Area Under the ROC Curve Score
#> 3: boruta Burota
#> 4: carscore Correlation-Adjusted coRrelation Score
#> 5: carsurvscore Correlation-Adjusted coRrelation Survival Score
#> 6: cmim Minimal Conditional Mutual Information Maximization
#> 7: correlation Correlation
#> 8: disr Double Input Symmetrical Relevance
#> 9: find_correlation Correlation-based Score
#> 10: importance Importance Score
#> 11: information_gain Information Gain
#> 12: jmi Joint Mutual Information
#> 13: jmim Minimal Joint Mutual Information Maximization
#> 14: kruskal_test Kruskal-Wallis Test
#> 15: mim Mutual Information Maximization
#> 16: mrmr Minimum Redundancy Maximal Relevancy
#> 17: njmim Minimal Normalised Joint Mutual Information Maximization
#> 18: performance Predictive Performance
#> 19: permutation Permutation Score
#> 20: relief RELIEF
#> key label
One “trick” of the filters package is that it can be used to access the $importance()
that some learners provide on their own, and ranger
provides the impurtiy-importance and permutation feature importance (PFI). We can access either with mlr3filters
directly, but note it retrains the learner:
<- lrn("classif.ranger", importance = "impurity_corrected")
lrn_ranger = flt("importance", learner = lrn_ranger)
filter_importance $calculate(penguin_task)
filter_importance
filter_importance
#> <FilterImportance:importance>: Importance Score
#> Task Types: classif
#> Properties: missings
#> Task Properties: -
#> Packages: mlr3, mlr3learners, ranger
#> Feature types: logical, integer, numeric, character, factor, ordered
#> feature score
#> 1: bill_length_mm 50.976878008
#> 2: flipper_length_mm 36.856975328
#> 3: bill_depth_mm 25.799481552
#> 4: body_mass_g 21.751449304
#> 5: island 20.717986260
#> 6: sex 0.848159529
#> 7: year -0.003874481
mlr3filters
also provides a general implementation for PFI that retrains the learner repeatedly with one feature randomly shuffled.
<- lrn("classif.ranger")
lrn_ranger = flt("permutation", learner = lrn_ranger)
filter_permutation $calculate(penguin_task)
filter_permutation
filter_permutation
#> <FilterPermutation:permutation>: Permutation Score
#> Task Types: classif
#> Properties: missings
#> Task Properties: -
#> Packages: mlr3, mlr3learners, ranger, mlr3measures
#> Feature types: logical, integer, numeric, character, factor, ordered
#> feature score
#> 1: bill_length_mm 1.00000000
#> 2: bill_depth_mm 0.09626437
#> 3: island 0.09195402
#> 4: sex 0.08764368
#> 5: body_mass_g 0.06896552
#> 6: flipper_length_mm 0.04885057
#> 7: year 0.02442529
But that also means we can use PFI for any other learner, such the SVM or XGBoost!
1.2 Your Turn!
- Compute PFI with
ranger
using it’s built-in$importance()
, usingmlr3filters
- Compute PFI for an SVM using the approproate
"permutation"
filter - Comapre the two methods. Do they agree?
# Your code
Note that we need to use the encoding PipeOp
to train the SVM of these on the penguins task, as they can’t handle categorical features automatically:
<- po("encode") %>>%
lrn_svm po("learner", lrn("classif.svm", kernel = "radial", <any other parameter>)) |>
as_learner()
This is just for demonstration — we’d need to use tuned hyperparameters for the SVM for a proper comparison!
<- as_task_classif(
penguin_task na.omit(palmerpenguins::penguins),
target = "species"
)
<- po("encode") %>>%
lrn_svm po("learner", lrn("classif.svm")) |>
as_learner()
= flt("permutation", learner = lrn_svm)
pfi_svm $calculate(penguin_task)
pfi_svm pfi_svm
#> <FilterPermutation:permutation>: Permutation Score
#> Task Types: classif
#> Properties: missings
#> Task Properties: -
#> Packages: mlr3, mlr3pipelines, stats, mlr3learners, e1071, mlr3measures
#> Feature types: logical, integer, numeric, character, factor, ordered,
#> POSIXct, Date
#> feature score
#> 1: bill_length_mm 1.00000000
#> 2: island -0.03846154
#> 3: bill_depth_mm -0.07122507
#> 4: sex -0.07122507
#> 5: flipper_length_mm -0.08974359
#> 6: body_mass_g -0.09116809
#> 7: year -0.10113960
= flt("importance", learner = lrn("classif.ranger", importance = "permutation"))
pfi_ranger $calculate(penguin_task)
pfi_ranger pfi_ranger
#> <FilterImportance:importance>: Importance Score
#> Task Types: classif
#> Properties: missings
#> Task Properties: -
#> Packages: mlr3, mlr3learners, ranger
#> Feature types: logical, integer, numeric, character, factor, ordered
#> feature score
#> 1: bill_length_mm 0.2502032220
#> 2: flipper_length_mm 0.1517506849
#> 3: bill_depth_mm 0.1254753264
#> 4: island 0.1247705530
#> 5: body_mass_g 0.0887346455
#> 6: sex 0.0130695250
#> 7: year -0.0002554921
1.3 Feature Effects with {effectplots}
Getting a number for “is this feature important” is nice, but often we want a better picture of the feature’s effect. Think of linear models and how we can interpret \(\beta_j\) as the linear relationship between \(X_j\) and \(Y\) — often things aren’t linear though.
One approach to visualize feature effects is via Partial Dependence Plots or preferably via Accumulated Local Effect plots (ALE), which we get from the {effectplots}
thankfully offers.
Let’s recycle our ranger
learner and plot some effects, using the partial dependence plot (PDP) as an example:
<- lrn("classif.ranger", predict_type = "prob")
lrn_ranger_cl $train(penguin_task)
lrn_ranger_cl
<- partial_dependence(
pd_penguins object = lrn_ranger_cl$model,
v = penguin_task$feature_names,
data = penguin_task$data(),
which_pred = "Adelie"
)
plot(pd_penguins)
We’re doing multiclass classification here, so while our learner predicts a probability for one of each of the three target classes (Adelie, Gentoo, Chinstrap), we need to pick one for the visualization!
1.4 Your turn! (Possibly for another time)
- Use the
bike_share
regression task to calculate the PDP - Stick with the
ranger
learner as{effectplots}
supports it directly.
# your code
The bike_sharing
task is a regression task, so make sure to switch to the regression version of the learner.
The target is bikers
, the number of people using a specific bike sharing service. More information can be found on the UCI website.