Processing math: 0%

Auto-Encoding Variational Bayes

ICLR 2014
@google.com

Contents

pdf

Introduction

  • Variational Bayesian (VB)
  • Approximate posterior using MLP
  • Stochatic Gradient Variational Bayes (SGVB)
  • Auto-Encoding VB (AEVB) algorithm
  • Variational auto-encoder (VAE)

Method

problem

  • the integral of the marginal likelihood p_\theta(x) = \int p_\theta(z)p_\theta(x|z) dz is intractable
  • a large dataset: the need of updating using small minibatches

graphic

  • Sold lines: generative model p_\theta(z)p_\theta(x|z)
  • Dashed lines: variational approximation q_\phi(z|x) to the posterior p_\theta(z|x)

encoder

q_\phi(z|x): an approximation to the intractable true posterior p_\theta(z|x)

decoder

p_\theta(x|z)

The variational bound

the marginal likelihoods of x^{(i)}

log\ p_\theta(x^{(i)}) = log\ p_\theta(x^{(i)},z) - log\ p_\theta(z|x^{(i)})

log\ p_\theta(x^{(i)}) = D_{KL}(q_{\phi}(z|x^{(i)})||p_{\theta}(z|x^{(i)})) + \mathcal{L}(\theta,\phi;x^{(i)})

log\ p_\theta(x^{(i)}) \geq \mathcal{L}(\theta,\phi;x^{(i)}) = \mathbb{E}{q_{\phi}(z|x^{(i)})}[-log\ q_\phi(z|x^{(i)}) + log\ p_{\theta}(x^{(i)},z)]

\mathcal{L}(\theta,\phi;x^{(i)}) = \mathbb{E}{q_{\phi}(z|x^{(i)})}[-log\ q_\phi(z|x^{(i)}) + log\ p_{\theta}(x^{(i)}|z) + log\ p_{\theta}(z) ]

\mathcal{L}(\theta,\phi;x^{(i)}) = -D_{KL}(q_\phi(z|x^{(i)}) || p_\theta(z)) + \mathbb{E}{q_{\phi}(z|x^{(i)})}[log\ p_\theta(x^{(i)}|z)]

The SGVB estimator and AEVB algorithm

do gradient ascent

differentiable transformation for z

reparameterization

\tilde{z} = g_\phi(\epsilon, x) with \epsilon \sim p(\epsilon)

Monte Carlo estimates

\mathbb{E}{q_\theta(z|x^{(i)})}[f(z)] = \mathbb{E}_{p(\epsilon)}[f(g_\phi(\epsilon, x^{(i)}))] \simeq \frac{1}{L} \Sigma_{l=1}^L f(g_\phi(\epsilon^{(l)}, x^{(i)}))

where \epsilon^{(l)} \sim p(\epsilon), sampling L datapoints

generic SGVB

from

\mathcal{L}(\theta,\phi;x^{(i)}) = \mathbb{E}{q_{\phi}(z|x^{(i)})}[-log\ q_\phi(z|x^{(i)}) + log\ p_{\theta}(x^{(i)},z)]

to

\tilde{\mathcal{L}}^A(\theta,\phi;x^{(i)}) = \frac{1}{L}\Sigma_{l=1}^Llog\ p_\theta(x^{(i)}, z^{(i,l)}) - log\ q_\phi(z^{(i,l)}|x^{(i)})

where z^{(i,l)} = g_\phi(\epsilon^{(i,l)}, x^{(i)}) and \epsilon^{(l)} \sim p(\epsilon)

second version of the SGVB

  • D_{KL}(q_\phi(z|x{(i)})||p_\theta(z)) can be integrated analytically
  • has less variance than the generic one

from

\mathcal{L}(\theta,\phi;x^{(i)}) = -D_{KL}(q_\phi(z|x^{(i)}) || p_\theta(z)) + \mathbb{E}{q_{\phi}(z|x^{(i)})}[log\ p_\theta(x^{(i)}|z)]

to

\tilde{\mathcal{L}}^B(\theta,\phi;x^{(i)}) = -D_{KL}(q_\phi(z|x^ {(i)})||p_\theta(z)) + \frac{1}{L}\Sigma_{l=1}^L(log\ p_\theta(x^{(i)}|z^{(i,l)}))

where z^{(i,l)} = g_\phi(\epsilon^{(i,l)}, x^{(i)}) and \epsilon^{(l)} \sim p(\epsilon)

multiple datapoints

N datapoints with M features (x^{(1)} \sim x^{(M)} )

\mathcal{L}(\theta,\phi;X) \simeq \tilde{\mathcal{L}}^M(\theta,\phi;X^M) = \frac{N}{M}\Sigma_{i=1}^M \tilde{\mathcal{L}}(\theta,\phi;x^{(i)})

AEVB

VAE

log\ q_\phi(z|x^{(i)}) = log\ N(z;\mu^{(i)},\sigma^{2(i)}I)

\mu^{(i)} and \sigma^{(i)} are output from MLP
MLP: nonlinear function of x^{(i)} and \phi

sampling z^{(i, l)}

z^{(i, l)} \sim q_\theta(z|x^{(i)})

z^{(i, l)} = g_\phi(x^{(i)}, \epsilon^{(l)}) = \mu^{(i)} + \sigma^{(i)} \times \epsilon^{(l)}

where \epsilon^{(l)} \sim N(0, I)

\mathcal{L}(\theta,\phi;x^{(i)}) \simeq \frac{1}{2} \Sigma_{j=1}^J(1 + log((\sigma_j^{(i)})^2) - (\mu_j^{(i)})^2 - (\sigma_j^{(i)})^2) + \frac{1}{L}\Sigma_{l=1}^Llog\ p_\theta(x^{(i)}|z^{(i,l)})

Experiments

likelihood lower bound

marginal likelihood

Reference

Author

Tracy Liu

Posted on

2019-07-31

Updated on

2021-03-31

Licensed under

Comments

0 comments
Anonymous
Markdown is supported

Be the first person to leave a comment!