Entropy, Entropy examples, Energy, Free Energy Principle, Poisson distribution
Andrius: Steph has asked me to study Jaynes's paper introducing his Maximum Entropy Principle. This principle is also central to Jere Northrop's thinking. It is key to understanding how energy and entropy are related and that interests me.
Maximum Entropy Principle
The Maximum Entropy Principle has us determine a probability distribution by admitting what is known and maximizing our ignorance (entropy, uncertainty, ambiguity) of what is not known.
The mathematics of the Maximum Entropy Principle is understood by way of Lagrange multipliers.
We are maximizing the entropy {$S(p_1,p_2,\dots ,p_n)=K\sum_{i=1}^n p_i\log \frac{1}{p_i}$}
Simple case: discrete probabilities
Suppose all that is known is that
- {$\sum_i^n p_i=1$} the probabilities sum to {$1$}
Define the constraint {$g(p_1,p_2,\dots , p_n)=\sum_{i=1}^n p_i - 1 = 0$}.
Consider {$\mathcal{L}(p_1,p_2,\dots ,p_n,\lambda)=S(p_1,p_2,\dots ,p_n)+\lambda g(p_1,p_2,\dots ,p_n)= K\sum_{i=1}^n p_i\log \frac{1}{p_i} + \lambda (\sum_i^n p_i - 1)$}.
Note that {$\mathcal{L}(p_1,p_2,\dots ,p_n,\lambda)=S(p_1,p_2,\dots ,p_n)$} for the points which interest us, namely, the points that satisfy the constraint {$g(p_1,p_2,\dots , p_n)=0$}.
{$\frac{\partial\mathcal{L}}{\partial \lambda}=0$} expresses the constraint that the points satisfy, namely {$g(p_1,p_2,\dots , p_n)=0$}.
{$\frac{\partial\mathcal{L}}{\partial p_i}=0$} for all {$i$} must be true at a point of maximum entropy (subject to the constraints)
{$\frac{\partial}{\partial p_i} [K\sum_{i=1}^n p_i\log \frac{1}{p_i} + \lambda (\sum_i^n p_i - 1)]=0$}
{$K\log \frac{1}{p_i} + Kp_i \frac{1}{\frac{1}{p_i}}\frac{-1}{p_i^2} + \lambda=0$}
{$K\log \frac{1}{p_i} - K + \lambda=0$}
{$K\log \frac{1}{p_i} = K - \lambda$}
{$\frac{1}{p_i} = e^{1-\frac{1}{K}\lambda}$}
{$p_i=e^{\frac{1}{K}\lambda -1}$}
Note that {$p_i$} for all {$i$} have the same value. There are {$n$} probabilities and so we conclude {$p_i=\frac{1}{n}$}. This is the uniform distribution.
Case: we know an expected value
In this case, what is known is that
- {$\sum_i^n p_i=1$} the probabilities sum to {$1$}
- {$\left< f(x_1,x_2\dots ,x_n) \right> = \sum_i^n p_if(x_i)$}
We use two variables {$\lambda$} and {$\mu$} to incorporate these constraints. We define
{$\mathcal{L}(p_1,p_2,\dots ,p_n,\lambda)=S(p_1,p_2,\dots ,p_n)+\lambda g_{\lambda}(p_1,p_2,\dots ,p_n) + \mu g_{\mu}(p_1,p_2,\dots ,p_n, x_1,x_2,\dots ,x_n)$}
{$\mathcal{L}(p_1,p_2,\dots ,p_n,\lambda)= K\sum_{i=1}^n p_i\log \frac{1}{p_i} + \lambda \dot (\sum_{i=1}^n p_i - 1) + \mu \dot (\sum_{i=1}^n p_if(x_i) - \left< f(x_1,x_2,\dots ,x_n)\right>)$}
{$\frac{\partial\mathcal{L}}{\partial p_i}=0$} for all {$i$} must be true at a point of maximum entropy (subject to the constraints)
{$\frac{\partial}{\partial p_i} [K\sum_{i=1}^n p_i\log \frac{1}{p_i} + \lambda \dot (\sum_{i=1}^n p_i - 1) + \mu \dot (\sum_{i=1}^n p_if(x_i) - \left< f(x_1,x_2,\dots ,x_n)\right>)] =0$}
We can reuse the calculation for the simpler case to get
{$K\log \frac{1}{p_i} - K + \lambda + \frac{\partial}{\partial p_i} [\mu \dot (\sum_{i=1}^n p_if(x_i) - \left< f(x_1,x_2,\dots ,x_n)\right>)] =0$}
{$K\log \frac{1}{p_i} - K + \lambda + \mu \frac{\partial}{\partial p_i} p_if(x_i)=0$}
{$K\log \frac{1}{p_i} - K + \lambda + \mu f(x_i) = 0$}
{$\log \frac{1}{p_i} = \frac{1}{K}(K - \lambda - \mu f(x_i))$}
{$\frac{1}{p_i} = e^{\frac{1}{K}(K - \lambda - \mu f(x_i))}$}
{$p_i = e^{\frac{1}{K}(-\lambda - \mu f(x_i) + K)}$}
Let's choose {$K=1$}
{$p_i = e^{-\lambda - \mu f(x_i) +1}$}
{$p_i = e^{(1-\lambda) - \mu f(x_i)}$}
Let's call {$a=1-\lambda$}
{$p_i = e^{a - \mu f(x_i)}$}
Let's call {$\lambda=-a$}
Jaynes got {$p_i = e^{-\lambda - \mu f(x_i)}$}
The product {$p_iE_i$}
Consider partial derivative of {$p_iU$} where {$U$} is internal energy.
Literature
- E.T.Jaynes. Information Theory and Statistical Mechanics.
- Wikipedia: Principle of maximum entropy
- Wikipedia: Lagrange multiplier
- Wikipedia: Maximum entropy probability distribution: Other examples
- Alex B. Kiefer. On the possibility of deep alignment.
- Xiang Gao, Emilio Gallicchio, Adrian E. Roitberg. The Generalized Boltzmann Distribution is the Only Distribution in Which the Gibbs-Shannon Entropy Equals the Thermodynamic Entropy.