The IntegratedGradient
class implements the method Integrated Gradients
(Sundararajan et al., 2017), which incorporates a reference value \(x'\)
(also known as baseline value) analogous to the DeepLift
method.
Integrated Gradients helps to uncover the relative importance of input
features in the predictions \(y = f(x)\) made by a model compared to the
prediction of the reference value \(y' = f(x')\). This is achieved through
the following formula:
$$
(x - x') \times \int_{\alpha=0}^{1} \frac{\partial f(x' + \alpha (x - x'))}{\partial x} d\alpha
$$
In simpler terms, it calculates how much each feature contributes to a
model's output by tracing a path from a baseline input \(x'\) to the actual
input \(x\) and measuring the average gradients along that path.
Similar to the other gradient-based methods, by default the integrated
gradient is multiplied by the input to get an approximate decomposition
of \(y - y'\). However, with the parameter times_input
only the gradient
describing the output sensitivity can be returned.
The R6 class can also be initialized using the run_intgrad
function
as a helper function so that no prior knowledge of R6 classes is required.
M. Sundararajan et al. (2017) Axiomatic attribution for deep networks. ICML 2017, PMLR 70, pp. 3319-3328.
Other methods:
ConnectionWeights
,
DeepLift
,
DeepSHAP
,
ExpectedGradient
,
Gradient
,
LIME
,
LRP
,
SHAP
,
SmoothGrad
innsight::InterpretingMethod
-> innsight::GradientBased
-> IntegratedGradient
n
(integer(1)
)
Number of steps for the approximation of the integration path along
\(\alpha\) (default: \(50\)).
x_ref
(list
)
The reference input for the IntegratedGradient method. This value is
stored as a list of torch_tensor
s of shape (1, dim_in) for each
input layer.
new()
Create a new instance of the IntegratedGradient
R6 class. When
initialized, the method Integrated Gradient is applied to the given
data and baseline value and the results are stored in the field result
.
IntegratedGradient$new(
converter,
data,
x_ref = NULL,
n = 50,
times_input = TRUE,
channels_first = TRUE,
output_idx = NULL,
output_label = NULL,
ignore_last_act = TRUE,
verbose = interactive(),
dtype = "float"
)
converter
(Converter
)
An instance of the Converter
class that includes the
torch-converted model and some other model-specific attributes. See
Converter
for details.
data
(array
, data.frame
, torch_tensor
or list
)
The data to which the method is to be applied. These must
have the same format as the input data of the passed model to the
converter object. This means either
an array
, data.frame
, torch_tensor
or array-like format of
size (batch_size, dim_in), if e.g., the model has only one input layer, or
a list
with the corresponding input data (according to the
upper point) for each of the input layers.
x_ref
(array
, data.frame
, torch_tensor
or list
)
The reference input for the IntegratedGradient method. This value
must have the same format as the input data of the passed model to the
converter object. This means either
an array
, data.frame
, torch_tensor
or array-like format of
size (1, dim_in), if e.g., the model has only one input layer, or
a list
with the corresponding input data (according to the upper point)
for each of the input layers.
It is also possible to use the default value NULL
to take only
zeros as reference input.
n
(integer(1)
)
Number of steps for the approximation of the integration path along
\(\alpha\) (default: \(50\)).
times_input
(logical(1
)
Multiplies the integrated gradients with the difference of the input
features and the baseline values. By default, the original definition of
IntegratedGradient is applied. However, by setting times_input = FALSE
only an approximation of the integral is calculated, which describes the
sensitivity of the features to the output.
channels_first
(logical(1)
)
The channel position of the given data (argument
data
). If TRUE
, the channel axis is placed at the second position
between the batch size and the rest of the input axes, e.g.,
c(10,3,32,32)
for a batch of ten images with three channels and a
height and width of 32 pixels. Otherwise (FALSE
), the channel axis
is at the last position, i.e., c(10,32,32,3)
. If the data
has no channel axis, use the default value TRUE
.
output_idx
(integer
, list
or NULL
)
These indices specify the output nodes for which
the method is to be applied. In order to allow models with multiple
output layers, there are the following possibilities to select
the indices of the output nodes in the individual output layers:
An integer
vector of indices: If the model has only one output
layer, the values correspond to the indices of the output nodes, e.g.,
c(1,3,4)
for the first, third and fourth output node. If there are
multiple output layers, the indices of the output nodes from the first
output layer are considered.
A list
of integer
vectors of indices: If the method is to be
applied to output nodes from different layers, a list can be passed
that specifies the desired indices of the output nodes for each
output layer. Unwanted output layers have the entry NULL
instead of
a vector of indices, e.g., list(NULL, c(1,3))
for the first and
third output node in the second output layer.
NULL
(default): The method is applied to all output nodes in
the first output layer but is limited to the first ten as the
calculations become more computationally expensive for more output
nodes.
output_label
(character
, factor
, list
or NULL
)
These values specify the output nodes for which
the method is to be applied. Only values that were previously passed with
the argument output_names
in the converter
can be used. In order to
allow models with multiple
output layers, there are the following possibilities to select
the names of the output nodes in the individual output layers:
A character
vector or factor
of labels: If the model has only one output
layer, the values correspond to the labels of the output nodes named in the
passed Converter
object, e.g.,
c("a", "c", "d")
for the first, third and fourth output node if the
output names are c("a", "b", "c", "d")
. If there are
multiple output layers, the names of the output nodes from the first
output layer are considered.
A list
of charactor
/factor
vectors of labels: If the method is to be
applied to output nodes from different layers, a list can be passed
that specifies the desired labels of the output nodes for each
output layer. Unwanted output layers have the entry NULL
instead of
a vector of labels, e.g., list(NULL, c("a", "c"))
for the first and
third output node in the second output layer.
NULL
(default): The method is applied to all output nodes in
the first output layer but is limited to the first ten as the
calculations become more computationally expensive for more output
nodes.
ignore_last_act
(logical(1)
)
Set this logical value to include the last
activation functions for each output layer, or not (default: TRUE
).
In practice, the last activation (especially for softmax activation) is
often omitted.
verbose
(logical(1)
)
This logical argument determines whether a progress bar is
displayed for the calculation of the method or not. The default value is
the output of the primitive R function interactive()
.
dtype
(character(1)
)
The data type for the calculations. Use
either 'float'
for torch_float or 'double'
for
torch_double.
#----------------------- Example 1: Torch ----------------------------------
library(torch)
# Create nn_sequential model and data
model <- nn_sequential(
nn_linear(5, 12),
nn_relu(),
nn_linear(12, 2),
nn_softmax(dim = 2)
)
data <- torch_randn(25, 5)
ref <- torch_randn(1, 5)
# Create Converter
converter <- convert(model, input_dim = c(5))
# Apply method IntegratedGradient
int_grad <- IntegratedGradient$new(converter, data, x_ref = ref)
# You can also use the helper function `run_intgrad` for initializing
# an R6 IntegratedGradient object
int_grad <- run_intgrad(converter, data, x_ref = ref)
# Print the result as a torch tensor for first two data points
get_result(int_grad, "torch.tensor")[1:2]
#> torch_tensor
#> (1,.,.) =
#> 0.2497 0.0294
#> 0.1428 -0.0004
#> -0.0128 -0.3901
#> 0.1458 -0.0242
#> -0.1488 0.0589
#>
#> (2,.,.) =
#> 0.0392 -0.0230
#> 0.0471 -0.0209
#> 0.0719 -0.1414
#> 0.0685 -0.0407
#> 0.1522 -0.0865
#> [ CPUFloatType{2,5,2} ]
# Plot the result for both classes
plot(int_grad, output_idx = 1:2)
# Plot the boxplot of all datapoints and for both classes
boxplot(int_grad, output_idx = 1:2)
# ------------------------- Example 2: Neuralnet ---------------------------
if (require("neuralnet")) {
library(neuralnet)
data(iris)
# Train a neural network
nn <- neuralnet((Species == "setosa") ~ Petal.Length + Petal.Width,
iris,
linear.output = FALSE,
hidden = c(3, 2), act.fct = "tanh", rep = 1
)
# Convert the model
converter <- convert(nn)
# Apply IntegratedGradient with a reference input of the feature means
x_ref <- matrix(colMeans(iris[, c(3, 4)]), nrow = 1)
int_grad <- run_intgrad(converter, iris[, c(3, 4)], x_ref = x_ref)
# Get the result as a dataframe and show first 5 rows
get_result(int_grad, type = "data.frame")[1:5, ]
# Plot the result for the first datapoint in the data
plot(int_grad, data_idx = 1)
# Plot the result as boxplots
boxplot(int_grad)
}
# ------------------------- Example 3: Keras -------------------------------
if (require("keras") & keras::is_keras_available()) {
library(keras)
# Make sure keras is installed properly
is_keras_available()
data <- array(rnorm(10 * 32 * 32 * 3), dim = c(10, 32, 32, 3))
model <- keras_model_sequential()
model %>%
layer_conv_2d(
input_shape = c(32, 32, 3), kernel_size = 8, filters = 8,
activation = "softplus", padding = "valid") %>%
layer_conv_2d(
kernel_size = 8, filters = 4, activation = "tanh",
padding = "same") %>%
layer_conv_2d(
kernel_size = 4, filters = 2, activation = "relu",
padding = "valid") %>%
layer_flatten() %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 2, activation = "softmax")
# Convert the model
converter <- convert(model)
# Apply the IntegratedGradient method with a zero baseline and n = 20
# iteration steps
int_grad <- run_intgrad(converter, data,
channels_first = FALSE,
n = 20
)
# Plot the result for the first image and both classes
plot(int_grad, output_idx = 1:2)
# Plot the pixel-wise median of the results
plot_global(int_grad, output_idx = 1)
}
#------------------------- Plotly plots ------------------------------------
if (require("plotly")) {
# You can also create an interactive plot with plotly.
# This is a suggested package, so make sure that it is installed
library(plotly)
boxplot(int_grad, as_plotly = TRUE)
}
#> Warning: The `boxplot()` function is only intended for tabular or signal data. It is
#> called `plot_global()` instead.