Direct Expected Gradients for torch models

Calculate Expected Gradients (GradSHAP) using native torch autograd. This function provides a lightweight alternative to run_expgrad and ExpectedGradient that is more efficient for torch models as it avoids model conversion overhead and uses native torch autograd directly. Therefore, it can be used for any torch::nn_module without restrictions on architecture or layers.

torch_expgrad(
  model,
  data,
  data_ref = NULL,
  output_idx = NULL,
  n = 50,
  dtype = "float",
  return_object = FALSE
)

Arguments

model: (nn_module)
A torch model.
data: (torch_tensor, array, or matrix)
Input data.
data_ref: (torch_tensor, array, or matrix)
Reference dataset for estimating conditional expectation. If NULL, uses zeros. Default: NULL.
output_idx: (integer)
Index or indices of output nodes. Default: NULL (all outputs).
n: (integer(1))
Number of reference samples and integration steps. Default: 50.
dtype: (character(1))
Data type: "float" or "double". Default: "float".
return_object: (logical(1))
If TRUE, returns a InterpretingMethod object with methods like plot() and get_result(). If FALSE (default), returns a raw torch_tensor.

Value

If return_object = FALSE (default): A torch_tensor containing the expected gradients with shape (batch_size, ..., n_outputs). If return_object = TRUE: A InterpretingMethod object.

Details

Expected Gradients extends Integrated Gradients by averaging over multiple reference values from a distribution:

$$E_{x'\sim X', \alpha \sim U(0,1)}[(x - x') \times \frac{\partial f(x' + \alpha (x - x'))}{\partial x}]$$

This provides approximate Shapley values.

References

G. Erion et al. (2021) Improving performance of deep learning models with axiomatic attribution priors and expected gradients. Nature Machine Intelligence 3, pp. 620-631.

Examples

library(torch)

model <- nn_sequential(nn_linear(10, 3))
data <- torch_randn(5, 10)
references <- torch_randn(100, 10)  # Reference distribution

# Calculate Expected Gradients
exp_grads <- torch_expgrad(model, data, data_ref = references)

Arguments

Value

Details

References

See also

Examples