Personal tools

Entropy Rates

Document Actions
  • Content View
  • Bookmarks
  • CourseFeed

The concept of entropy rate allows us to talk about entropy for sequences of random variables that are not independent.

Up to this point we have made the assumption that the random variables that we have been dealing with have been independent and identically distributed. Of course, in the real world, independence is not commonly encountered: the letters emerging from a stream of text are not independent.

In this lecture we will introduce the means of treating sequences of dependent random variables.

The {\bf entropy rate} of a stochastic process $\{X_i\}$\ is...
\end{displaymath}provided that the limit exists.

There is also a related quantity for entropy:


\begin{displaymath}H'(\Xc) = \lim_{n\rightarrow \infty}
H(X_n\vert X_{n-1},X_{n-2},\ldots,X_{1}).


These are two different concepts of entropy: the first is the (average) per-symbol entropy of all n random variables. The second is the conditional entropy, conditioned upon all prior random variables. However, (and somewhat surprisingly), for stationary sequences these are the same:

For a stationary stochastic process, the two defined entropy ra...
... are equal}:
\begin{displaymath}H(\Xc) = H'(\Xc).

Before proving this, we will prove another necessary result: $\lim H(X_n\vert X_{n-1},X_{n-2},\ldots,X_1)$ exists:

For a stationary random process, $H(X_n\vert X_{n-1},X_{n-2},\ldots,X_1)$is decreasing in $n$\ and has a limit $H'(\Xc)$.

H(X_{n+1}\vert X_1,X_2,\ldots,X...
... decreasing sequence of
non-negative numbers, and must have a limit.

And now a result from analysis:
(Cesaro mean) If $a_n\rightarrow a$\ and $b_n =
\frac{1}{n}\sum_{i=1}^n a_i$\ then $b_n \rightarrow a$.
The idea is that since most of the numbers in a n are eventually close to a , then b n , which is the average of the first n terms must also be close to a : as n gets large, the first terms become increasingly less important.

The proof is a lot of the $\epsilon$ -ish sort of stuff that analysis thrive on and most of us simply tolerate at best:
Since $a_n \rightarrow a$, then for any $\epsilon>0$\ there is a n...
...the difference $\vert b_n -a\vert$\ can be made
as small as desired.

Now we can prove the equality H ( X ) = H '( X ).
By the chain rule,
...n} = \lim
H(X_n\vert X_{n-1},\ldots,X_1) = H'(\Xc).

The generalization of the AEP theorem of the last chapter is true (but we won't prove it here): for a sequence of identically distributed (but not necessarily independent r.v.s),


\begin{displaymath}-\frac{1}{n}\log p(X_1,\ldots,X_n) \rightarrow H(\Xc).

with probability 1 (strong convergence!). Based on this generalization, it is possible to define a notion of typical sequences, and determine the number of typical sequences (approximately $2^{nH(\Xc)}$ ), each with probability about $2^{-nH(\Xc)}$ . A representation therefore exists which requires approximately $nH(\Xc)$ bits.

There is a lot more material in the chapter about Markov processes and Hidden Markov models. However, in the interest of moving toward our goal, I will not talk about it in class.

Copyright 2008, Todd Moon. Cite/attribute Resource . admin. (2006, May 15). Entropy Rates. Retrieved January 07, 2011, from Free Online Course Materials — USU OpenCourseWare Web site: This work is licensed under a Creative Commons License Creative Commons License