econet | Econet / MaximumEntropyPrinciple

Entropy, Entropy examples, Energy, Free Energy Principle, Poisson distribution

Andrius: Steph has asked me to study Jaynes's paper introducing his Maximum Entropy Principle. This principle is also central to Jere Northrop's thinking. It is key to understanding how energy and entropy are related and that interests me.

Maximum Entropy Principle

The Maximum Entropy Principle has us determine a probability distribution by admitting what is known and maximizing our ignorance (entropy, uncertainty, ambiguity) of what is not known.

The mathematics of the Maximum Entropy Principle is understood by way of Lagrange multipliers.

We are maximizing the entropy {$S(p_1,p_2,\dots ,p_n)=K\sum_{i=1}^n p_i\log \frac{1}{p_i}$}

Simple case: discrete probabilities

Suppose all that is known is that

{$\sum_i^n p_i=1$} the probabilities sum to {$1$}

Define the constraint {$g(p_1,p_2,\dots , p_n)=\sum_{i=1}^n p_i - 1 = 0$}.

Consider {$\mathcal{L}(p_1,p_2,\dots ,p_n,\lambda)=S(p_1,p_2,\dots ,p_n)+\lambda g(p_1,p_2,\dots ,p_n)= K\sum_{i=1}^n p_i\log \frac{1}{p_i} + \lambda (\sum_i^n p_i - 1)$}.

Note that {$\mathcal{L}(p_1,p_2,\dots ,p_n,\lambda)=S(p_1,p_2,\dots ,p_n)$} for the points which interest us, namely, the points that satisfy the constraint {$g(p_1,p_2,\dots , p_n)=0$}.

{$\frac{\partial\mathcal{L}}{\partial \lambda}=0$} expresses the constraint that the points satisfy, namely {$g(p_1,p_2,\dots , p_n)=0$}.

{$\frac{\partial\mathcal{L}}{\partial p_i}=0$} for all {$i$} must be true at a point of maximum entropy (subject to the constraints)

{$\frac{\partial}{\partial p_i} [K\sum_{i=1}^n p_i\log \frac{1}{p_i} + \lambda (\sum_i^n p_i - 1)]=0$}

{$K\log \frac{1}{p_i} + Kp_i \frac{1}{\frac{1}{p_i}}\frac{-1}{p_i^2} + \lambda=0$}

{$K\log \frac{1}{p_i} - K + \lambda=0$}

{$K\log \frac{1}{p_i} = K - \lambda$}

{$\frac{1}{p_i} = e^{1-\frac{1}{K}\lambda}$}

{$p_i=e^{\frac{1}{K}\lambda -1}$}

Note that {$p_i$} for all {$i$} have the same value. There are {$n$} probabilities and so we conclude {$p_i=\frac{1}{n}$}. This is the uniform distribution.

Case: we know an expected value

In this case, what is known is that

{$\sum_i^n p_i=1$} the probabilities sum to {$1$}
{$\left< f(x_1,x_2\dots ,x_n) \right> = \sum_i^n p_if(x_i)$}

We use two variables {$\lambda$} and {$\mu$} to incorporate these constraints. We define

{$\mathcal{L}(p_1,p_2,\dots ,p_n,\lambda)=S(p_1,p_2,\dots ,p_n)+\lambda g_{\lambda}(p_1,p_2,\dots ,p_n) + \mu g_{\mu}(p_1,p_2,\dots ,p_n, x_1,x_2,\dots ,x_n)$}

{$\mathcal{L}(p_1,p_2,\dots ,p_n,\lambda)= K\sum_{i=1}^n p_i\log \frac{1}{p_i} + \lambda \dot (\sum_{i=1}^n p_i - 1) + \mu \dot (\sum_{i=1}^n p_if(x_i) - \left< f(x_1,x_2,\dots ,x_n)\right>)$}

{$\frac{\partial\mathcal{L}}{\partial p_i}=0$} for all {$i$} must be true at a point of maximum entropy (subject to the constraints)

{$\frac{\partial}{\partial p_i} [K\sum_{i=1}^n p_i\log \frac{1}{p_i} + \lambda \dot (\sum_{i=1}^n p_i - 1) + \mu \dot (\sum_{i=1}^n p_if(x_i) - \left< f(x_1,x_2,\dots ,x_n)\right>)] =0$}

We can reuse the calculation for the simpler case to get

{$K\log \frac{1}{p_i} - K + \lambda + \frac{\partial}{\partial p_i} [\mu \dot (\sum_{i=1}^n p_if(x_i) - \left< f(x_1,x_2,\dots ,x_n)\right>)] =0$}

{$K\log \frac{1}{p_i} - K + \lambda + \mu \frac{\partial}{\partial p_i} p_if(x_i)=0$}

{$K\log \frac{1}{p_i} - K + \lambda + \mu f(x_i) = 0$}

{$\log \frac{1}{p_i} = \frac{1}{K}(K - \lambda - \mu f(x_i))$}

{$\frac{1}{p_i} = e^{\frac{1}{K}(K - \lambda - \mu f(x_i))}$}

{$p_i = e^{\frac{1}{K}(-\lambda - \mu f(x_i) + K)}$}

Let's choose {$K=1$}

{$p_i = e^{-\lambda - \mu f(x_i) +1}$}

{$p_i = e^{(1-\lambda) - \mu f(x_i)}$}

Let's call {$a=1-\lambda$}

{$p_i = e^{a - \mu f(x_i)}$}

Let's call {$\lambda=-a$}

Jaynes got {$p_i = e^{-\lambda - \mu f(x_i)}$}

The product {$p_iE_i$}

Consider partial derivative of {$p_iU$} where {$U$} is internal energy.

Literature