The concept of entropy rate allows us to talk about entropy for sequences of random variables that are not independent.
Up to this point we have made the assumption that the random variables that we have been dealing with have been independent and identically distributed. Of course, in the real world, independence is not commonly encountered: the letters emerging from a stream of text are not independent.
In this lecture we will introduce the means of treating sequences of dependent random variables.
There is also a related quantity for entropy:
These are two different concepts of entropy: the first is the (average) per-symbol entropy of all n random variables. The second is the conditional entropy, conditioned upon all prior random variables. However, (and somewhat surprisingly), for stationary sequences these are the same:
Before proving this, we will prove another necessary result:
exists:
And now a result from analysis:
The idea is that since most of the numbers in an are eventually close to a, then bn, which is the average of the first n terms must also be close to a: as n gets large, the first terms become increasingly less important.
The proof is a lot of the
-ish sort of stuff that analysis thrive on and most of us simply tolerate at best:
Now we can prove the equality H(X) = H'(X).
The generalization of the AEP theorem of the last chapter is true (but we won't prove it here): for a sequence of identically distributed (but not necessarily independent r.v.s),
![]()
with probability 1 (strong convergence!). Based on this generalization, it is possible to define a notion of typical sequences, and determine the number of typical sequences (approximately
), each with probability about
. A representation therefore exists which requires approximately
bits.
There is a lot more material in the chapter about Markov processes and Hidden Markov models. However, in the interest of moving toward our goal, I will not talk about it in class.