Shannon Information Content
For an event , its Shannon Information Content is defined as:
Intuitively, it measures the “surprise” or “unlikeliness” for an event or an outcome of probability distribution. An event that always happen gives Shannon Information Content 0, while the event with 0 probability gives infinity.
Shannon Entropy
We can think of Shannon Entropy as the measurement of “surprise”, uncertainty, or randomness of a random variable or the distribution associated to it.
By looking at the equation, we can interrupt it as the expectation of how “surprise” the outcomes of the distribution is, by sampling the outcomes from the distribution itself.
Intuitively, distributions (discrete) with more possible outcomes will have higher Entropy (randomness). A distribution with equal likeliness in each outcome will also have higher Entropy (uncertainty) than the distribution with biased likeliness.
KL Divergence
The KL Divergence measures how diverge that two distributions and are.
Similar to Shannon Entropy, we can use the following formula to measure the expectation of “surprise” or “unlikeliness” of a distribution , but taking samples from another distribution .
If , this just gives the entropy of . But it will always greater (more randomness) than when .
The KL Divergence is measuring the excess of this “suprise” with the baseline of .
Note that .
Suppose we have two experiments:
- Flipping a fair coin, with head and tail half-half
- Flipping a biased coin that will only return head
Let and be the distributions of the two experiment outcomes respectively.
because . But,
because .
Thus, equals to 1, but explodes to infinity.
No comments:
Post a Comment