A general test for conditional independence in supervised learning algorithms. Implements a conditional variable importance measure which can be applied to any supervised learning algorithm and loss function. Provides statistical inference procedures without parametric assumptions and applies equally well to continuous and categorical predictors and outcomes.
Usage
cpi(
task,
learner,
resampling = NULL,
test_data = NULL,
measure = NULL,
test = "t",
log = FALSE,
B = 1999,
alpha = 0.05,
x_tilde = NULL,
aggr_fun = mean,
knockoff_fun = function(x) knockoff::create.second_order(as.matrix(x)),
groups = NULL,
verbose = FALSE
)
Arguments
- task
The prediction
mlr3
task, see examples.- learner
The
mlr3
learner used in CPI. If you pass a string, the learner will be created viamlr3::lrn
.- resampling
Resampling strategy,
mlr3
resampling object (e.g.rsmp("holdout")
), "oob" (out-of-bag) or "none" (in-sample loss).- test_data
External validation data, use instead of resampling.
- measure
Performance measure (loss). Per default, use MSE (
"regr.mse"
) for regression and logloss ("classif.logloss"
) for classification.- test
Statistical test to perform, one of
"t"
(t-test, default),"wilcox"
(Wilcoxon signed-rank test),"binom"
(binomial test),"fisher"
(Fisher permutation test) or "bayes" (Bayesian testing, computationally intensive!). See Details.- log
Set to
TRUE
for multiplicative CPI (\(\lambda\)), toFALSE
(default) for additive CPI (\(\Delta\)).- B
Number of permutations for Fisher permutation test.
- alpha
Significance level for confidence intervals.
- x_tilde
Knockoff matrix or data.frame. If not given (the default), it will be created with the function given in
knockoff_fun
. Also accepts a list of matrices or data.frames.- aggr_fun
Aggregation function over replicates.
- knockoff_fun
Function to generate knockoffs. Default:
knockoff::create.second_order
with matrix argument.- groups
(Named) list with groups. Set to
NULL
(default) for no groups, i.e. compute CPI for each feature. See examples.- verbose
Verbose output of resampling procedure.
Value
For test = "bayes"
a list of BEST
objects. In any other
case, a data.frame
with a row for each feature and columns:
- Variable/Group
Variable/group name
- CPI
CPI value
- SE
Standard error
- test
Testing method
- statistic
Test statistic (only for t-test, Wilcoxon and binomial test)
- estimate
Estimated mean (for t-test), median (for Wilcoxon test), or proportion of \(\Delta\)-values greater than 0 (for binomial test).
- p.value
p-value
- ci.lo
Lower limit of (1 -
alpha
) * 100% confidence interval
Note that NA values are no error but a result of a CPI value of 0, i.e. no difference in model performance after replacing a feature with its knockoff.
Details
This function computes the conditional predictive impact (CPI) of one or several features on a given supervised learning task. This represents the mean error inflation when replacing a true variable with its knockoff. Large CPI values are evidence that the feature(s) in question have high conditional variable importance – i.e., the fitted model relies on the feature(s) to predict the outcome, even after accounting for the signal from all remaining covariates.
We build on the mlr3
framework, which provides a unified interface for
training models, specifying loss functions, and estimating generalization
error. See the package documentation for more info.
Methods are implemented for frequentist and Bayesian inference. The default
is test = "t"
, which is fast and powerful for most sample sizes. The
Wilcoxon signed-rank test (test = "wilcox"
) may be more appropriate if
the CPI distribution is skewed, while the binomial test (test = "binom"
)
requires basically no assumptions but may have less power. For small sample
sizes, we recommend permutation tests (test = "fisher"
) or Bayesian
methods (test = "bayes"
). In the latter case, default priors are
assumed. See the BEST
package for more info.
For parallel execution, register a backend, e.g. with
doParallel::registerDoParallel()
.
References
Watson, D. & Wright, M. (2020). Testing conditional independence in supervised learning algorithms. Machine Learning, 110(8): 2107-2129. doi:10.1007/s10994-021-06030-6
Candès, E., Fan, Y., Janson, L, & Lv, J. (2018). Panning for gold: 'model-X' knockoffs for high dimensional controlled variable selection. J. R. Statistc. Soc. B, 80(3): 551-577. doi:10.1111/rssb.12265
Examples
library(mlr3)
library(mlr3learners)
# Regression with linear model and holdout validation
cpi(task = tsk("mtcars"), learner = lrn("regr.lm"),
resampling = rsmp("holdout"))
#> Variable CPI SE test statistic estimate
#> 1 am 1.511269e-01 0.1353364697 t 1.116675090 1.511269e-01
#> 2 carb 7.177261e-04 0.0008040053 t 0.892688274 7.177261e-04
#> 3 cyl -1.603035e+00 1.9235839396 t -0.833358373 -1.603035e+00
#> 4 disp -2.837083e-05 0.0002846269 t -0.099677272 -2.837083e-05
#> 5 drat -9.169321e-01 1.4362550982 t -0.638418689 -9.169321e-01
#> 6 gear -1.684137e-03 1.1431889217 t -0.001473192 -1.684137e-03
#> 7 hp 1.335019e-01 0.1400283892 t 0.953391640 1.335019e-01
#> 8 qsec -3.821204e-01 0.3403810736 t -1.122625382 -3.821204e-01
#> 9 vs -7.696940e-03 0.3517929711 t -0.021879176 -7.696940e-03
#> 10 wt 5.575406e-01 2.3763168665 t 0.234623843 5.575406e-01
#> p.value ci.lo
#> 1 0.1451227 -0.0941652253
#> 2 0.1964996 -0.0007395023
#> 3 0.7879499 -5.0894558899
#> 4 0.5387148 -0.0005442460
#> 5 0.7312266 -3.5200886244
#> 6 0.5005732 -2.0736696136
#> 7 0.1814342 -0.1202941158
#> 8 0.8560866 -0.9990478958
#> 9 0.5085126 -0.6453080236
#> 10 0.4096175 -3.7494413417
# \donttest{
# Classification with logistic regression, log-loss and t-test
cpi(task = tsk("wine"),
learner = lrn("classif.glmnet", predict_type = "prob", lambda = 0.1),
resampling = rsmp("holdout"),
measure = "classif.logloss", test = "t")
#> Variable CPI SE test statistic estimate
#> 1 alcalinity -2.483667e-04 2.439210e-03 t -0.1018226 -2.483667e-04
#> 2 alcohol 3.759549e-02 1.867277e-02 t 2.0133854 3.759549e-02
#> 3 ash 0.000000e+00 0.000000e+00 t 0.0000000 0.000000e+00
#> 4 color 1.706278e-02 8.204707e-03 t 2.0796326 1.706278e-02
#> 5 dilution 4.240278e-03 5.754550e-03 t 0.7368565 4.240278e-03
#> 6 flavanoids -2.073059e-05 8.422713e-06 t -2.4612727 -2.073059e-05
#> 7 hue 1.691236e-03 6.249021e-03 t 0.2706402 1.691236e-03
#> 8 magnesium 0.000000e+00 0.000000e+00 t 0.0000000 0.000000e+00
#> 9 malic 0.000000e+00 0.000000e+00 t 0.0000000 0.000000e+00
#> 10 nonflavanoids 0.000000e+00 0.000000e+00 t 0.0000000 0.000000e+00
#> 11 phenols 0.000000e+00 0.000000e+00 t 0.0000000 0.000000e+00
#> 12 proanthocyanins 0.000000e+00 0.000000e+00 t 0.0000000 0.000000e+00
#> 13 proline 5.492125e-02 3.268019e-02 t 1.6805672 5.492125e-02
#> p.value ci.lo
#> 1 0.54037566 -0.0043256347
#> 2 0.02436121 0.0063829647
#> 3 1.00000000 0.0000000000
#> 4 0.02099470 0.0033481760
#> 5 0.23208994 -0.0053787561
#> 6 0.99157962 -0.0000348096
#> 7 0.39381429 -0.0087543320
#> 8 1.00000000 0.0000000000
#> 9 1.00000000 0.0000000000
#> 10 1.00000000 0.0000000000
#> 11 1.00000000 0.0000000000
#> 12 1.00000000 0.0000000000
#> 13 0.04911281 0.0002945923
# Use your own data (and out-of-bag loss with random forest)
mytask <- as_task_classif(iris, target = "Species")
mylearner <- lrn("classif.ranger", predict_type = "prob", keep.inbag = TRUE)
cpi(task = mytask, learner = mylearner,
resampling = "oob", measure = "classif.logloss")
#> Variable CPI SE test statistic estimate
#> 1 Petal.Length -0.0003107344 0.0001992726 t -1.5593433 -0.0003107344
#> 2 Petal.Width -0.0073844725 0.0228868112 t -0.3226519 -0.0073844725
#> 3 Sepal.Length -0.0004196657 0.0004036849 t -1.0395874 -0.0004196657
#> 4 Sepal.Width -0.0019064194 0.0046740547 t -0.4078727 -0.0019064194
#> p.value ci.lo
#> 1 0.9394817 -0.0006405593
#> 2 0.6262944 -0.0452654530
#> 3 0.8498922 -0.0010878225
#> 4 0.6580237 -0.0096426555
# Group CPI
cpi(task = tsk("iris"),
learner = lrn("classif.ranger", predict_type = "prob", num.trees = 10),
resampling = rsmp("cv", folds = 3),
groups = list(Sepal = 1:2, Petal = 3:4))
#> Group CPI SE test statistic estimate p.value
#> 1 Sepal -0.001893357 0.007294353 t -0.2595647 -0.001893357 0.6022211
#> 2 Petal 0.002692659 0.003528954 t 0.7630190 0.002692659 0.2233292
#> ci.lo
#> 1 -0.01396657
#> 2 -0.00314827
# }
if (FALSE) { # \dontrun{
# Bayesian testing
res <- cpi(task = tsk("iris"),
learner = lrn("classif.glmnet", predict_type = "prob", lambda = 0.1),
resampling = rsmp("holdout"),
measure = "classif.logloss", test = "bayes")
plot(res$Petal.Length)
# Parallel execution
doParallel::registerDoParallel()
cpi(task = tsk("wine"),
learner = lrn("classif.glmnet", predict_type = "prob", lambda = 0.1),
resampling = rsmp("cv", folds = 5))
# Use sequential knockoffs for categorical features
# package available here: https://github.com/kormama1/seqknockoff
mytask <- as_task_regr(iris, target = "Petal.Length")
cpi(task = mytask, learner = lrn("regr.ranger"),
resampling = rsmp("holdout"),
knockoff_fun = seqknockoff::knockoffs_seq)
} # }