Calculate Expected Gradients (GradSHAP) using native torch autograd. This function provides a lightweight alternative to run_expgrad and ExpectedGradient that is more efficient for torch models as it avoids model conversion overhead and uses native torch autograd directly. Therefore, it can be used for any torch::nn_module without restrictions on architecture or layers.

torch_expgrad(
  model,
  data,
  data_ref = NULL,
  output_idx = NULL,
  n = 50,
  dtype = "float",
  return_object = FALSE
)

Arguments

model

(nn_module)
A torch model.

data

(torch_tensor, array, or matrix)
Input data.

data_ref

(torch_tensor, array, or matrix)
Reference dataset for estimating conditional expectation. If NULL, uses zeros. Default: NULL.

output_idx

(integer)
Index or indices of output nodes. Default: NULL (all outputs).

n

(integer(1))
Number of reference samples and integration steps. Default: 50.

dtype

(character(1))
Data type: "float" or "double". Default: "float".

return_object

(logical(1))
If TRUE, returns a InterpretingMethod object with methods like plot() and get_result(). If FALSE (default), returns a raw torch_tensor.

Value

If return_object = FALSE (default): A torch_tensor containing the expected gradients with shape (batch_size, ..., n_outputs). If return_object = TRUE: A InterpretingMethod object.

Details

Expected Gradients extends Integrated Gradients by averaging over multiple reference values from a distribution:

$$E_{x'\sim X', \alpha \sim U(0,1)}[(x - x') \times \frac{\partial f(x' + \alpha (x - x'))}{\partial x}]$$

This provides approximate Shapley values.

References

G. Erion et al. (2021) Improving performance of deep learning models with axiomatic attribution priors and expected gradients. Nature Machine Intelligence 3, pp. 620-631.

Examples

library(torch)

model <- nn_sequential(nn_linear(10, 3))
data <- torch_randn(5, 10)
references <- torch_randn(100, 10)  # Reference distribution

# Calculate Expected Gradients
exp_grads <- torch_expgrad(model, data, data_ref = references)