Smooth Maximum

An overview of the concept “Smooth Maximum”. In this work, we discuss important concepts of smooth max, a concept useful to make the maximum operator differentiable in deep learning.

Smooth Maximum

For large positive values of parameter $\alpha > 0$ , the following formulation is a smooth, differentiable approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum.

$S_{\alpha}(x_1,...,x_n) = \frac{\sum _{i=1}^n x_i e^{\alpha x_i}}{\sum _{i=1}^n e^{\alpha x_i}}$

Thus, $S_{\alpha}$ has the following useful properties:

$S_{\alpha} \rightarrow \max$ as $\alpha \rightarrow \infty$ .
$S_{\alpha} \rightarrow \text{mean}$ as $\alpha \rightarrow 0$ .
$S_{\alpha} \rightarrow \min$ as $\alpha \rightarrow -\infty$ .

LogSumExp

Another option for a smooth maximum function is the LogSumExp.

$\text{LSE}_{\alpha}(x_1,...,x_n) = (\frac{1}{\alpha})\log(\exp(\alpha x_1)+...+\exp(\alpha x_n))$

The formulation shares derivation from entropic regularization process in reinforcement learning.

p-Norm

Another smooth maximum is the p-norm. As $p \rightarrow \infty$ , the p-Norm tends to the maximum funciton.

$\begin{Vmatrix} (x_1,x_2,...,x_n) \end{Vmatrix}_p = (|x_1|^p+...+|x_n|^p)^{1/p}$

An intrinsic advantage of the p-norm is that it is a norm. As such, it is “scale invariant” (homogeneous).

Written on July 18, 2022