Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

ICCV 2017
@vt.edu @gatech.edu

Contents

Introduction

  • Input: Images
  • Output: Visualization of discrimination regions
  • Interpretability
  • Without architectural changes or re-training
  • To give explaination to the prediction of some models
  • General version of CAM

Grad-CAM

Neuron importance weights $\alpha_k^c$

  • class: $c$
  • the gradient of the score for $c$: $y^c$
  • feature maps: $A^k$
  • the number of pixels: $Z$

$$\alpha_k^c = \frac{1}{Z} \sum_i\sum_j \frac{\partial y^c}{\partial A_{ij}^k}$$

$$L_{Grad-CAM}^c = ReLU(\sum_k \alpha_k^cA^k)$$

Compare with CAM

  • the prediction: $S^c$

$$S^c = \sum_k w_k^c \frac{1}{Z}\sum_i\sum_j A_{ij}^k$$

$$S^c = \frac{1}{Z}\sum_i\sum_j\sum_k w_k^cA_{ij}^k$$

$$L_{CAM}^c = \sum_k w_k^cA_{ij}^k$$

GAP

  • $A^k$: a feature map of a convolutional layer
  • $A^k_{ij}$: a pixel value of $A^k$
  • $F^k$: GAP of the feature map in the last layer
  • $Z = i \times j$: the number of pixels

$$F^k=\frac{1}{Z}\sum_i\sum_jA^k_{ij}$$

Final Result

  • $Y^c$: the prediction
  • $W^c_k$: weights

$$Y^c=\sum_k w^c_k \cdot F^k$$

From GAP

$$\frac{\partial Y^c}{\partial F^k}=\frac{\partial Y^c}{\partial A^k_{ij}}\frac{\partial A^k_{ij}}{\partial F^k}=\frac{\frac{\partial Y^c}{\partial A^k_{ij}}}{\frac{\partial F^k}{\partial A^k_{ij}}}
$$

$$
\Rightarrow \frac{\partial Y^c}{\partial F^k}=\frac{\partial Y^c}{\partial A^k_{ij}}\cdot Z
$$

From Final Result

$$
\frac{\partial Y^c}{\partial F^k}=\frac{\partial \sum_k w^c_k \cdot F^k}{\partial F^k}=w^c_k
$$

$$
\Rightarrow w^c_k=\frac{\partial Y^c}{\partial A^k_{ij}}\cdot Z
$$

$$
\sum_i\sum_jw^c_k=\sum_i\sum_j\frac{\partial Y^c}{\partial A^k_{ij}}\cdot Z
$$

$$
Z,w^c_k=Z \sum_i\sum_j\frac{\partial Y^c}{\partial A^k_{ij}}
$$

$$
w^c_k=\sum_i\sum_j\frac{\partial Y^c}{\partial A^k_{ij}}
$$

Reference

Author

Tracy Liu

Posted on

2019-09-19

Updated on

2021-03-31

Licensed under

Comments