Calculate Integrated Gradients using native torch autograd. This function provides a lightweight alternative to run_intgrad and IntegratedGradient that works directly with any torch::nn_module objects.

torch_intgrad(
  model,
  data,
  x_ref = NULL,
  output_idx = NULL,
  n = 50,
  times_input = TRUE,
  dtype = "float",
  return_object = FALSE
)

Arguments

model

(nn_module)
A torch model. Must be an instance of nn_module.

data

(torch_tensor, array, or matrix)
Input data for which to calculate gradients.

x_ref

(torch_tensor, array, or matrix)
Reference input (baseline). If NULL, uses zeros. Must have shape (1, ...) where ... matches input dimensions. Default: NULL.

output_idx

(integer)
Index or indices of output nodes. Default: NULL (all outputs).

n

(integer(1))
Number of steps for approximating the integral. Default: 50.

times_input

(logical(1))
If TRUE, multiplies integrated gradients by (data - x_ref). This is the standard Integrated Gradients formulation. Default: TRUE.

dtype

(character(1))
Data type: "float" or "double". Default: "float".

return_object

(logical(1))
If TRUE, returns a InterpretingMethod object with methods like plot() and get_result(). If FALSE (default), returns a raw torch_tensor.

Value

If return_object = FALSE (default): A torch_tensor containing the integrated gradients with shape (batch_size, ..., n_outputs). If return_object = TRUE: A InterpretingMethod object.

Details

Integrated Gradients calculates feature importance by integrating gradients along the path from a baseline \(x'\) to the input \(x\):

$$(x - x') \times \int_{\alpha=0}^{1} \frac{\partial f(x' + \alpha (x - x'))}{\partial x} d\alpha$$

The integral is approximated using n interpolation steps.

References

M. Sundararajan et al. (2017) Axiomatic attribution for deep networks. ICML 2017, PMLR 70, pp. 3319-3328.

Examples

library(torch)

model <- nn_sequential(nn_linear(10, 3))
data <- torch_randn(5, 10)

# Use zero baseline (default)
int_grads <- torch_intgrad(model, data)

# Use custom baseline
baseline <- torch_zeros(1, 10)
int_grads <- torch_intgrad(model, data, x_ref = baseline)

# More integration steps for higher accuracy
int_grads <- torch_intgrad(model, data, n = 100)