Skip to contents

A general test for conditional independence in supervised learning algorithms. Implements a conditional variable importance measure which can be applied to any supervised learning algorithm and loss function. Provides statistical inference procedures without parametric assumptions and applies equally well to continuous and categorical predictors and outcomes.

Usage

cpi(
  task,
  learner,
  resampling = NULL,
  test_data = NULL,
  measure = NULL,
  test = "t",
  log = FALSE,
  B = 1999,
  alpha = 0.05,
  x_tilde = NULL,
  knockoff_fun = function(x) knockoff::create.second_order(as.matrix(x)),
  groups = NULL,
  verbose = FALSE
)

Arguments

task

The prediction mlr3 task, see examples.

learner

The mlr3 learner used in CPI. If you pass a string, the learner will be created via mlr3::lrn.

resampling

Resampling strategy, mlr3 resampling object (e.g. rsmp("holdout")), "oob" (out-of-bag) or "none" (in-sample loss).

test_data

External validation data, use instead of resampling.

measure

Performance measure (loss). Per default, use MSE ("regr.mse") for regression and logloss ("classif.logloss") for classification.

test

Statistical test to perform, one of "t" (t-test, default), "wilcox" (Wilcoxon signed-rank test), "binom" (binomial test), "fisher" (Fisher permutation test) or "bayes" (Bayesian testing, computationally intensive!). See Details.

log

Set to TRUE for multiplicative CPI (\(\lambda\)), to FALSE (default) for additive CPI (\(\Delta\)).

B

Number of permutations for Fisher permutation test.

alpha

Significance level for confidence intervals.

x_tilde

Knockoff matrix or data.frame. If not given (the default), it will be created with the function given in knockoff_fun.

knockoff_fun

Function to generate knockoffs. Default: knockoff::create.second_order with matrix argument.

groups

(Named) list with groups. Set to NULL (default) for no groups, i.e. compute CPI for each feature. See examples.

verbose

Verbose output of resampling procedure.

Value

For test = "bayes" a list of BEST objects. In any other case, a data.frame with a row for each feature and columns:

Variable/Group

Variable/group name

CPI

CPI value

SE

Standard error

test

Testing method

statistic

Test statistic (only for t-test, Wilcoxon and binomial test)

estimate

Estimated mean (for t-test), median (for Wilcoxon test), or proportion of \(\Delta\)-values greater than 0 (for binomial test).

p.value

p-value

ci.lo

Lower limit of (1 - alpha) * 100% confidence interval

Note that NA values are no error but a result of a CPI value of 0, i.e. no difference in model performance after replacing a feature with its knockoff.

Details

This function computes the conditional predictive impact (CPI) of one or several features on a given supervised learning task. This represents the mean error inflation when replacing a true variable with its knockoff. Large CPI values are evidence that the feature(s) in question have high conditional variable importance -- i.e., the fitted model relies on the feature(s) to predict the outcome, even after accounting for the signal from all remaining covariates.

We build on the mlr3 framework, which provides a unified interface for training models, specifying loss functions, and estimating generalization error. See the package documentation for more info.

Methods are implemented for frequentist and Bayesian inference. The default is test = "t", which is fast and powerful for most sample sizes. The Wilcoxon signed-rank test (test = "wilcox") may be more appropriate if the CPI distribution is skewed, while the binomial test (test = "binom") requires basically no assumptions but may have less power. For small sample sizes, we recommend permutation tests (test = "fisher") or Bayesian methods (test = "bayes"). In the latter case, default priors are assumed. See the BEST package for more info.

For parallel execution, register a backend, e.g. with doParallel::registerDoParallel().

References

Watson, D. & Wright, M. (2020). Testing conditional independence in supervised learning algorithms. Machine Learning, 110(8): 2107-2129. doi: 10.1007/s10994-021-06030-6

Candès, E., Fan, Y., Janson, L, & Lv, J. (2018). Panning for gold: 'model-X' knockoffs for high dimensional controlled variable selection. J. R. Statistc. Soc. B, 80(3): 551-577. doi: 10.1111/rssb.12265

Examples

library(mlr3)
library(mlr3learners)

# Regression with linear model and holdout validation
cpi(task = tsk("mtcars"), learner = lrn("regr.lm"), 
    resampling = rsmp("holdout"))
#>    Variable          CPI          SE test   statistic     estimate   p.value
#> 1        am -3.689498320 3.778102530    t -0.97654796 -3.689498320 0.8240875
#> 2      carb  0.001322134 0.001026834    t  1.28758296  0.001322134 0.1134455
#> 3       cyl -5.307075504 3.428663109    t -1.54785563 -5.307075504 0.9236516
#> 4      disp  0.000162036 0.000150753    t  1.07484402  0.000162036 0.1538471
#> 5      drat -0.200173476 3.934572656    t -0.05087553 -0.200173476 0.5197867
#> 6      gear -0.625114335 3.197308127    t -0.19551270 -0.625114335 0.5755464
#> 7        hp  0.149518757 0.173005083    t  0.86424488  0.149518757 0.2038530
#> 8      qsec  0.194833385 1.972371391    t  0.09878129  0.194833385 0.4616320
#> 9        vs  0.303358250 0.682365676    t  0.44456845  0.303358250 0.3330494
#> 10       wt -0.166235157 4.165471594    t -0.03990788 -0.166235157 0.5155240
#>            ci.lo
#> 1  -1.053716e+01
#> 2  -5.389627e-04
#> 3  -1.152139e+01
#> 4  -1.111980e-04
#> 5  -7.331433e+00
#> 6  -6.420111e+00
#> 7  -1.640462e-01
#> 8  -3.380013e+00
#> 9  -9.334030e-01
#> 10 -7.715990e+00

# \donttest{
# Classification with logistic regression, log-loss and t-test
cpi(task = tsk("wine"), 
    learner = lrn("classif.glmnet", predict_type = "prob", lambda = 0.1), 
    resampling = rsmp("holdout"), 
    measure = "classif.logloss", test = "t")
#>           Variable           CPI           SE test  statistic      estimate
#> 1       alcalinity  0.000000e+00 0.000000e+00    t  0.0000000  0.000000e+00
#> 2          alcohol  2.613428e-02 2.329808e-02    t  1.1217354  2.613428e-02
#> 3              ash -2.540242e-04 1.130663e-04    t -2.2466840 -2.540242e-04
#> 4            color  1.158246e-02 7.315373e-03    t  1.5833035  1.158246e-02
#> 5         dilution  4.660688e-03 7.908992e-03    t  0.5892897  4.660688e-03
#> 6       flavanoids  1.606949e-06 8.188642e-06    t  0.1962412  1.606949e-06
#> 7              hue  7.005110e-03 7.983298e-03    t  0.8774707  7.005110e-03
#> 8        magnesium  0.000000e+00 0.000000e+00    t  0.0000000  0.000000e+00
#> 9            malic  0.000000e+00 0.000000e+00    t  0.0000000  0.000000e+00
#> 10   nonflavanoids  0.000000e+00 0.000000e+00    t  0.0000000  0.000000e+00
#> 11         phenols  0.000000e+00 0.000000e+00    t  0.0000000  0.000000e+00
#> 12 proanthocyanins  0.000000e+00 0.000000e+00    t  0.0000000  0.000000e+00
#> 13         proline  5.851581e-02 2.354280e-02    t  2.4855072  5.851581e-02
#>        p.value         ci.lo
#> 1  1.000000000  0.0000000000
#> 2  0.133298810 -0.0128096892
#> 3  0.985759085 -0.0004430204
#> 4  0.059395207 -0.0006455762
#> 5  0.278977698 -0.0085596092
#> 6  0.422553672 -0.0000120808
#> 7  0.191925655 -0.0063393934
#> 8  1.000000000  0.0000000000
#> 9  1.000000000  0.0000000000
#> 10 1.000000000  0.0000000000
#> 11 1.000000000  0.0000000000
#> 12 1.000000000  0.0000000000
#> 13 0.007920566  0.0191627691
 
# Use your own data (and out-of-bag loss with random forest)
mytask <- as_task_classif(iris, target = "Species")
mylearner <- lrn("classif.ranger", predict_type = "prob", keep.inbag = TRUE)
cpi(task = mytask, learner = mylearner, 
    resampling = "oob", measure = "classif.logloss")
#>       Variable           CPI           SE test  statistic      estimate
#> 1 Petal.Length -0.0018963480 0.0026714184    t -0.7098656 -0.0018963480
#> 2  Petal.Width  0.0175027344 0.0209443417    t  0.8356784  0.0175027344
#> 3 Sepal.Length -0.0006059492 0.0003482966    t -1.7397502 -0.0006059492
#> 4  Sepal.Width -0.0021509386 0.0040631277    t -0.5293800 -0.0021509386
#>     p.value        ci.lo
#> 1 0.7605515 -0.006317932
#> 2 0.2023370 -0.017163178
#> 3 0.9580161 -0.001182430
#> 4 0.7013352 -0.008876002
    
# Group CPI
cpi(task = tsk("iris"), 
    learner = lrn("classif.ranger", predict_type = "prob", num.trees = 10), 
    resampling = rsmp("cv", folds = 3), 
    groups = list(Sepal = 1:2, Petal = 3:4))
#>   Group         CPI          SE test statistic    estimate   p.value
#> 1 Sepal 0.006257263 0.006975786    t 0.8969975 0.006257263 0.1855836
#> 2 Petal 0.005267782 0.005031503    t 1.0469598 0.005267782 0.1484068
#>          ci.lo
#> 1 -0.005288671
#> 2 -0.003060084
# }     
if (FALSE) {      
# Bayesian testing
res <- cpi(task = tsk("iris"), 
           learner = lrn("classif.glmnet", predict_type = "prob", lambda = 0.1), 
           resampling = rsmp("holdout"), 
           measure = "classif.logloss", test = "bayes")
plot(res$Petal.Length)

# Parallel execution
doParallel::registerDoParallel()
cpi(task = tsk("wine"), 
    learner = lrn("classif.glmnet", predict_type = "prob", lambda = 0.1), 
    resampling = rsmp("cv", folds = 5))
    
# Use sequential knockoffs for categorical features
# package available here: https://github.com/kormama1/seqknockoff
mytask <- as_task_regr(iris, target = "Petal.Length")
cpi(task = mytask, learner = lrn("regr.ranger"), 
    resampling = rsmp("holdout"), 
    knockoff_fun = seqknockoff::knockoffs_seq)
}