Posted 2019-07-31Updated 2021-03-31Paper Notes0 visits

Auto-Encoding Variational Bayes

ICLR 2014
@google.com

pdf

Introduction

Variational Bayesian (VB)
Approximate posterior using MLP
Stochatic Gradient Variational Bayes (SGVB)
Auto-Encoding VB (AEVB) algorithm
Variational auto-encoder (VAE)

Method

problem

the integral of the marginal likelihood $p_\theta(x) = \int p_\theta(z)p_\theta(x|z) dz$ is intractable
a large dataset: the need of updating using small minibatches

graphic

Sold lines: generative model $p_\theta(z)p_\theta(x|z)$
Dashed lines: variational approximation $q_\phi(z|x)$ to the posterior $p_\theta(z|x)$

encoder

$q_\phi(z|x)$: an approximation to the intractable true posterior $p_\theta(z|x)$

decoder

$p_\theta(x|z)$

The variational bound

the marginal likelihoods of $x^{(i)}$

$log\ p_\theta(x^{(i)}) = log\ p_\theta(x^{(i)},z) - log\ p_\theta(z|x^{(i)})$

$log\ p_\theta(x^{(i)}) = D_{KL}(q_{\phi}(z|x^{(i)})||p_{\theta}(z|x^{(i)})) + \mathcal{L}(\theta,\phi;x^{(i)})$

$log\ p_\theta(x^{(i)}) \geq \mathcal{L}(\theta,\phi;x^{(i)}) = \mathbb{E}{q_{\phi}(z|x^{(i)})}[-log\ q_\phi(z|x^{(i)}) + log\ p_{\theta}(x^{(i)},z)]$

$\mathcal{L}(\theta,\phi;x^{(i)}) = \mathbb{E}{q_{\phi}(z|x^{(i)})}[-log\ q_\phi(z|x^{(i)}) +$ $log\ p_{\theta}(x^{(i)}|z) + log\ p_{\theta}(z)$ $]$

$\mathcal{L}(\theta,\phi;x^{(i)}) = -D_{KL}(q_\phi(z|x^{(i)}) || p_\theta(z)) + \mathbb{E}{q_{\phi}(z|x^{(i)})}[log\ p_\theta(x^{(i)}|z)]$

The SGVB estimator and AEVB algorithm

do gradient ascent

differentiable transformation for $z$

reparameterization

$\tilde{z} = g_\phi(\epsilon, x)$ with $\epsilon \sim p(\epsilon)$

Monte Carlo estimates

$\mathbb{E}{q_\theta(z|x^{(i)})}[f(z)] = \mathbb{E}_{p(\epsilon)}[f(g_\phi(\epsilon, x^{(i)}))] \simeq \frac{1}{L} \Sigma_{l=1}^L f(g_\phi(\epsilon^{(l)}, x^{(i)}))$

where $\epsilon^{(l)} \sim p(\epsilon)$, sampling $L$ datapoints

generic SGVB

from

$\mathcal{L}(\theta,\phi;x^{(i)}) = \mathbb{E}{q_{\phi}(z|x^{(i)})}[-log\ q_\phi(z|x^{(i)}) + log\ p_{\theta}(x^{(i)},z)]$

$\tilde{\mathcal{L}}^A(\theta,\phi;x^{(i)}) = \frac{1}{L}\Sigma_{l=1}^Llog\ p_\theta(x^{(i)}, z^{(i,l)}) - log\ q_\phi(z^{(i,l)}|x^{(i)})$

where $z^{(i,l)} = g_\phi(\epsilon^{(i,l)}, x^{(i)})$ and $\epsilon^{(l)} \sim p(\epsilon)$

second version of the SGVB

$D_{KL}(q_\phi(z|x{(i)})||p_\theta(z))$ can be integrated analytically
has less variance than the generic one

from

$\mathcal{L}(\theta,\phi;x^{(i)}) = -D_{KL}(q_\phi(z|x^{(i)}) || p_\theta(z)) + \mathbb{E}{q_{\phi}(z|x^{(i)})}[log\ p_\theta(x^{(i)}|z)]$

$\tilde{\mathcal{L}}^B(\theta,\phi;x^{(i)}) = -D_{KL}(q_\phi(z|x^
{(i)})||p_\theta(z)) + \frac{1}{L}\Sigma_{l=1}^L(log\ p_\theta(x^{(i)}|z^{(i,l)}))$

where $z^{(i,l)} = g_\phi(\epsilon^{(i,l)}, x^{(i)})$ and $\epsilon^{(l)} \sim p(\epsilon)$

multiple datapoints

$N$ datapoints with $M$ features ($x^{(1)} \sim x^{(M)}$ )

$\mathcal{L}(\theta,\phi;X) \simeq \tilde{\mathcal{L}}^M(\theta,\phi;X^M) = \frac{N}{M}\Sigma_{i=1}^M \tilde{\mathcal{L}}(\theta,\phi;x^{(i)})$

AEVB

VAE

$log\ q_\phi(z|x^{(i)}) = log\ N(z;\mu^{(i)},\sigma^{2(i)}I)$

$\mu^{(i)}$ and $\sigma^{(i)}$ are output from MLP
MLP: nonlinear function of $x^{(i)}$ and $\phi$

sampling $z^{(i, l)}$

$z^{(i, l)} \sim q_\theta(z|x^{(i)})$

$z^{(i, l)} = g_\phi(x^{(i)}, \epsilon^{(l)}) = \mu^{(i)} + \sigma^{(i)} \times \epsilon^{(l)}$

where $\epsilon^{(l)} \sim N(0, I)$

$\mathcal{L}(\theta,\phi;x^{(i)}) \simeq \frac{1}{2} \Sigma_{j=1}^J(1 + log((\sigma_j^{(i)})^2) - (\mu_j^{(i)})^2 - (\sigma_j^{(i)})^2) + \frac{1}{L}\Sigma_{l=1}^Llog\ p_\theta(x^{(i)}|z^{(i,l)})$

Experiments

likelihood lower bound

marginal likelihood

Reference

Auto-Encoding Variational Bayes

https://tracyliu1220.github.io/2019/07/31/2019-07-31-Auto-Encoding-Variational-Bayes/

Author

Tracy Liu

Posted on

2019-07-31

Updated on

2021-03-31

Licensed under

#deep learning

Auto-Encoding Variational Bayes

Contents

Introduction

Method

problem

graphic

encoder

decoder

The variational bound

the marginal likelihoods of $x^{(i)}$

The SGVB estimator and AEVB algorithm

differentiable transformation for $z$

Monte Carlo estimates

generic SGVB

second version of the SGVB

multiple datapoints

AEVB

VAE

sampling $z^{(i, l)}$

Experiments

likelihood lower bound

marginal likelihood

Reference

Author

Posted on

Updated on

Licensed under

Comments

Catalogue