Evidence Lower Bound

An overview of the concept “ELBO”. In this work, we discuss two important concepts to theoretical machine learning, often used in proofs/derivation of energy functions, convergence, etc.

ELBO

The evidence lower bound (ELBO) is an important quantity that lies at the core of number of important probabilistic inferences such as expectation-maximization, and variational inference.

In a latent variable model, we posit that our observed data is a realization from some random variable . Moreover, we posit the existence of another random variable where and are distributed according to a joint distribution where parameterizes the distribution. Note that is observed an not (latent).

There are two predominant tasks that are of interest here:

  • Given some fixed value of , we would like to compute the posterior distribution of the latent, i.e., .
  • Given that is unknown, we would like to find the maximum likelihood estimate of : . Here, is the log-likelihood function:

Variational inference is used for Task 1, and expectation maximization is used for Task 2.

What is ELBO?

Evidence is a name given to the likelihood function evaluated at a fixed parameter :

Intuitively, if we have chosen a right and , then we would expect that the marginal probability of our observed data , would be high. Thus a higher value of evidence would suggest we are on the right track. The derivation goes something as follows:

The final line makes use of Jensen’s Inequality of concave function (log, since its double derivative is less than 0).

The gap between the evidence and ELBO can also be computed as follows:

Written on June 15, 2022