Information Theory Basics

An overview of the concept “Entropy”, “Mutual Information”, “KL Divergence”, and “F-Divergence”. In this work, we discuss important concepts to information theory, often useful for intuition, and proofs.

Entropy

The entropy of a probability distribution () is denoted with the symbol , such that . Furthermore, we can prove that this entropy is bounded above using Jensen’s Inequality as follows:

Mutual Information

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the “amount of information” (in units such as shannons (bits), nats or hartleys) obtained about one random variable by observing the other random variable. The concept of mutual information is intimately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected “amount of information” held in a random variable. The Mutual information is basically the KL divergence between the joint probability function , and the marginal probability distributions , and .

KL Divergence

In mathematical statistics, Kullback-Leibler divergence, denoted by , is a type of statistical distance: a measure of how one probability distribution is different from the second reference probability distribution . A simple interpretation of KL divergence of from is the expected excess surprise from using as a model when the actual distribution is . The exact definition of KL divergence is explained below.

F Divergence

In probability theory, an f-divergence is a function that measures the difference between two probability distributions and . Many common divergences, such as KL-divergence, Hellinger distance, and total variational distance, are special cases of f-divergence.

Here, f is called the generator of . In concrete applications, there is usally a reference distribution on , such that , then we can use Radon-Nikodym theorem to take their probability densities p and q giving:

When the generator function is x\log x, we get the KL-Divegence.

Written on June 17, 2022