Evidence Lower Bound

An overview of the concept “ELBO”. In this work, we discuss two important concepts to theoretical machine learning, often used in proofs/derivation of energy functions, convergence, etc.

ELBO

The evidence lower bound (ELBO) is an important quantity that lies at the core of number of important probabilistic inferences such as expectation-maximization, and variational inference.

In a latent variable model, we posit that our observed data $x$ is a realization from some random variable $X$ . Moreover, we posit the existence of another random variable $Z$ where $X$ and $Z$ are distributed according to a joint distribution $p(X,Z;\theta)$ where $\theta$ parameterizes the distribution. Note that $X$ is observed an not $Z$ (latent).

There are two predominant tasks that are of interest here:

Given some fixed value of $\theta$ , we would like to compute the posterior distribution of the latent, i.e., $p(Z|X;\theta)$ .
Given that $\theta$ is unknown, we would like to find the maximum likelihood estimate of $\theta$ : $\arg \max_{\theta}l(\theta)$ . Here, $l$ is the log-likelihood function:

$l(\theta) := \log p(x;\theta) = \log \int _z p(x,z;\theta)dz$

Variational inference is used for Task 1, and expectation maximization is used for Task 2.

What is ELBO?

Evidence is a name given to the likelihood function evaluated at a fixed parameter $\theta$ :

$\text{evidence} := \log p(x;\theta)$

Intuitively, if we have chosen a right $p$ and $\theta$ , then we would expect that the marginal probability of our observed data $x$ , would be high. Thus a higher value of evidence would suggest we are on the right track. The derivation goes something as follows:

$\log p(x;\theta) = \log \int _z p(x,z;\theta)dz$ $= \log \int _z p(x,z;\theta)\frac{q(z)}{q(z)}dz$ $= \log \mathbb{E}_{Z \sim q}\begin{bmatrix} \frac{p(x,Z; \theta)}{q(z)} \end{bmatrix}$ $\geq \mathbb{E}_{Z \sim q} \begin{bmatrix} \log \frac{p(x,Z; \theta)}{q(z)}\end{bmatrix}$

The final line makes use of Jensen’s Inequality of concave function (log, since its double derivative is less than 0).

The gap between the evidence and ELBO can also be computed as follows:

$\text{KL}(q(z)||p(z|x;\theta)) := \mathbb{E}_{Z \sim q} \begin{bmatrix} \log \frac{q(Z)}{p(Z|x;\theta)} \end{bmatrix}$ $=\mathbb{E}_{Z \sim q} \begin{bmatrix} \log q(Z) \end{bmatrix} - \mathbb{E}_{Z \sim q} \begin{bmatrix} \log \frac{p(x,Z;\theta)}{p(x;\theta)} \end{bmatrix}$ $=\mathbb{E}_{Z \sim q} \begin{bmatrix} \log q(Z) \end{bmatrix} - \mathbb{E}_{Z \sim q} \begin{bmatrix} \log p(x,Z;\theta) \end{bmatrix} + \mathbb{E}_{Z \sim q} \begin{bmatrix} \log p(x;\theta) \end{bmatrix}$ $=\log p(x;\theta) - \mathbb{E}_{Z \sim q} \begin{bmatrix} \log \frac{p(x,Z;\theta)}{q(Z)} \end{bmatrix}$ $= \text{evidence} - \text{ELBO}$

Written on June 15, 2022