Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
ICCV 2017
@vt.edu @gatech.edu
Contents
Introduction
- Input: Images
- Output: Visualization of discrimination regions
- Interpretability
- Without architectural changes or re-training
- To give explaination to the prediction of some models
- General version of CAM
Grad-CAM
Neuron importance weights $\alpha_k^c$
- class: $c$
- the gradient of the score for $c$: $y^c$
- feature maps: $A^k$
- the number of pixels: $Z$
$$\alpha_k^c = \frac{1}{Z} \sum_i\sum_j \frac{\partial y^c}{\partial A_{ij}^k}$$
$$L_{Grad-CAM}^c = ReLU(\sum_k \alpha_k^cA^k)$$
Compare with CAM
- the prediction: $S^c$
$$S^c = \sum_k w_k^c \frac{1}{Z}\sum_i\sum_j A_{ij}^k$$
$$S^c = \frac{1}{Z}\sum_i\sum_j\sum_k w_k^cA_{ij}^k$$
$$L_{CAM}^c = \sum_k w_k^cA_{ij}^k$$
GAP
- $A^k$: a feature map of a convolutional layer
- $A^k_{ij}$: a pixel value of $A^k$
- $F^k$: GAP of the feature map in the last layer
- $Z = i \times j$: the number of pixels
$$F^k=\frac{1}{Z}\sum_i\sum_jA^k_{ij}$$
Final Result
- $Y^c$: the prediction
- $W^c_k$: weights
$$Y^c=\sum_k w^c_k \cdot F^k$$
From GAP
$$\frac{\partial Y^c}{\partial F^k}=\frac{\partial Y^c}{\partial A^k_{ij}}\frac{\partial A^k_{ij}}{\partial F^k}=\frac{\frac{\partial Y^c}{\partial A^k_{ij}}}{\frac{\partial F^k}{\partial A^k_{ij}}}
$$
$$
\Rightarrow \frac{\partial Y^c}{\partial F^k}=\frac{\partial Y^c}{\partial A^k_{ij}}\cdot Z
$$
From Final Result
$$
\frac{\partial Y^c}{\partial F^k}=\frac{\partial \sum_k w^c_k \cdot F^k}{\partial F^k}=w^c_k
$$
$$
\Rightarrow w^c_k=\frac{\partial Y^c}{\partial A^k_{ij}}\cdot Z
$$
$$
\sum_i\sum_jw^c_k=\sum_i\sum_j\frac{\partial Y^c}{\partial A^k_{ij}}\cdot Z
$$
$$
Z,w^c_k=Z \sum_i\sum_j\frac{\partial Y^c}{\partial A^k_{ij}}
$$
$$
w^c_k=\sum_i\sum_j\frac{\partial Y^c}{\partial A^k_{ij}}
$$
Reference
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization