Deep Bayes: Variational inference
Reference: Deep Bayes
Full Bayesian inference
Training Stage
Testing Stage
Comment: The denominator in training stage sometimes may be intractable. Posterior distributions can be calculated analytically only for simple conjugate models.
Approximate inference
Probabilistic model:
Variational Inference:
Biased but fast and more scalable
samples from unnormalized
unbiased but need a lot of samples
Some mathematic magic:
The first item is ELBO, evidence lower bound
THe second item is KL divergence, Kullback-Leibler divergence
Variational Inference: ELBO interpretation
Final optimisation problem
The first item is data item, the second item is regularizer.
Mean field approximation
then we could use the following replacement to reformulate the equation:
So the above equation can become:
\ Update each factor
Parametric optimization
Inference Summary
Statistical Inference
continuous latent variables can be regarded as a mixture of a continuum of distributions
E-step can be done in closed form only in case of contiguous distributions, otherwise the true posterior is intractable.
Typically continuous latent variables are used for dimension reduction also known as representation learning.
Example: PCA model
Consider ,such that D>>d
Joint distribution:
consists of matrix V, D-dimensional vector and scalar
EM-PCA and Mixture of PCA
joint distribution:
Variational autoencoder
EM for VAE
However, the denominator is still intractable.
Variational inference
parametric variational inference
Instead of direct infering of p(z_i | x_i,\theta) let us define flexible variational approximation
This additional Neural Network ensures tractability of the distribution while being very flexible.
Stochastic optimization
Problem 1: The training data is assumed to be large which means iterations might be expensive
Problem 2: The integral in ELBO is still intractable
Solution: Compute stochastic gradients by using mini-batching and Monto-Carlo estimation
Optimization w.r.t.
However, if we use Monte-Carlo estimation:
However, when it comes to , it is another case:
Can no longer move gradient inside integral
Log-derivative trick
if we apply the trick, it yields to:
Then the expectations can be estimated using monte carlo methods.
Log-derivative trick for ELBO
Now consider its first term and apply mini-batching and log-derivative trick
We can prove that the score function: is zero mean.
However, the term can be arbitrary large negative that leads to very unstable stochastic gradients
A partial solution is to use baselines
Consider a function , such that:
Remember that the so-called score function can meet the requirements.
I am a lazy man.
Reparameterization trick
Consider differentiation of complex expectation
Express as a deterministic function g(.) of random and and perform change-of-variables rule
Then stochastic differentiation is simply
Good Good Study,Day Day Up