Auto-Encoding Variational Bayes
ICLR 2014
@google.com
Contents
Introduction
- Variational Bayesian (VB)
- Approximate posterior using MLP
- Stochatic Gradient Variational Bayes (SGVB)
- Auto-Encoding VB (AEVB) algorithm
- Variational auto-encoder (VAE)
Method
problem
- the integral of the marginal likelihood p_\theta(x) = \int p_\theta(z)p_\theta(x|z) dz is intractable
- a large dataset: the need of updating using small minibatches
graphic
- Sold lines: generative model p_\theta(z)p_\theta(x|z)
- Dashed lines: variational approximation q_\phi(z|x) to the posterior p_\theta(z|x)
encoder
q_\phi(z|x): an approximation to the intractable true posterior p_\theta(z|x)
decoder
p_\theta(x|z)
The variational bound
the marginal likelihoods of x^{(i)}
log\ p_\theta(x^{(i)}) = log\ p_\theta(x^{(i)},z) - log\ p_\theta(z|x^{(i)})
log\ p_\theta(x^{(i)}) = D_{KL}(q_{\phi}(z|x^{(i)})||p_{\theta}(z|x^{(i)})) + \mathcal{L}(\theta,\phi;x^{(i)})
log\ p_\theta(x^{(i)}) \geq \mathcal{L}(\theta,\phi;x^{(i)}) = \mathbb{E}{q_{\phi}(z|x^{(i)})}[-log\ q_\phi(z|x^{(i)}) + log\ p_{\theta}(x^{(i)},z)]
\mathcal{L}(\theta,\phi;x^{(i)}) = \mathbb{E}{q_{\phi}(z|x^{(i)})}[-log\ q_\phi(z|x^{(i)}) + log\ p_{\theta}(x^{(i)}|z) + log\ p_{\theta}(z) ]
\mathcal{L}(\theta,\phi;x^{(i)}) = -D_{KL}(q_\phi(z|x^{(i)}) || p_\theta(z)) + \mathbb{E}{q_{\phi}(z|x^{(i)})}[log\ p_\theta(x^{(i)}|z)]
The SGVB estimator and AEVB algorithm
do gradient ascent
differentiable transformation for z
reparameterization
\tilde{z} = g_\phi(\epsilon, x) with \epsilon \sim p(\epsilon)
Monte Carlo estimates
\mathbb{E}{q_\theta(z|x^{(i)})}[f(z)] = \mathbb{E}_{p(\epsilon)}[f(g_\phi(\epsilon, x^{(i)}))] \simeq \frac{1}{L} \Sigma_{l=1}^L f(g_\phi(\epsilon^{(l)}, x^{(i)}))
where \epsilon^{(l)} \sim p(\epsilon), sampling L datapoints
generic SGVB
from
\mathcal{L}(\theta,\phi;x^{(i)}) = \mathbb{E}{q_{\phi}(z|x^{(i)})}[-log\ q_\phi(z|x^{(i)}) + log\ p_{\theta}(x^{(i)},z)]
to
\tilde{\mathcal{L}}^A(\theta,\phi;x^{(i)}) = \frac{1}{L}\Sigma_{l=1}^Llog\ p_\theta(x^{(i)}, z^{(i,l)}) - log\ q_\phi(z^{(i,l)}|x^{(i)})
where z^{(i,l)} = g_\phi(\epsilon^{(i,l)}, x^{(i)}) and \epsilon^{(l)} \sim p(\epsilon)
second version of the SGVB
- D_{KL}(q_\phi(z|x{(i)})||p_\theta(z)) can be integrated analytically
- has less variance than the generic one
from
\mathcal{L}(\theta,\phi;x^{(i)}) = -D_{KL}(q_\phi(z|x^{(i)}) || p_\theta(z)) + \mathbb{E}{q_{\phi}(z|x^{(i)})}[log\ p_\theta(x^{(i)}|z)]
to
\tilde{\mathcal{L}}^B(\theta,\phi;x^{(i)}) = -D_{KL}(q_\phi(z|x^ {(i)})||p_\theta(z)) + \frac{1}{L}\Sigma_{l=1}^L(log\ p_\theta(x^{(i)}|z^{(i,l)}))
where z^{(i,l)} = g_\phi(\epsilon^{(i,l)}, x^{(i)}) and \epsilon^{(l)} \sim p(\epsilon)
multiple datapoints
N datapoints with M features (x^{(1)} \sim x^{(M)} )
\mathcal{L}(\theta,\phi;X) \simeq \tilde{\mathcal{L}}^M(\theta,\phi;X^M) = \frac{N}{M}\Sigma_{i=1}^M \tilde{\mathcal{L}}(\theta,\phi;x^{(i)})
AEVB
VAE
log\ q_\phi(z|x^{(i)}) = log\ N(z;\mu^{(i)},\sigma^{2(i)}I)
\mu^{(i)} and \sigma^{(i)} are output from MLP
MLP: nonlinear function of x^{(i)} and \phi
sampling z^{(i, l)}
z^{(i, l)} \sim q_\theta(z|x^{(i)})
z^{(i, l)} = g_\phi(x^{(i)}, \epsilon^{(l)}) = \mu^{(i)} + \sigma^{(i)} \times \epsilon^{(l)}
where \epsilon^{(l)} \sim N(0, I)
\mathcal{L}(\theta,\phi;x^{(i)}) \simeq \frac{1}{2} \Sigma_{j=1}^J(1 + log((\sigma_j^{(i)})^2) - (\mu_j^{(i)})^2 - (\sigma_j^{(i)})^2) + \frac{1}{L}\Sigma_{l=1}^Llog\ p_\theta(x^{(i)}|z^{(i,l)})
Experiments
likelihood lower bound
marginal likelihood
Reference
Auto-Encoding Variational Bayes
https://tracyliu1220.github.io/2019/07/31/2019-07-31-Auto-Encoding-Variational-Bayes/
Be the first person to leave a comment!