Personal tools
  •  
You are here: Home Electrical and Computer Engineering Information Theory Definitions and Basic Facts

Definitions and Basic Facts

Document Actions
  • Content View
  • Bookmarks
  • CourseFeed

Entropy Function   ::   Joint Entropy   ::   Relative Entropy   ::   Multivariable   ::   Convexity

Relative entropy and mutual information

Suppose there is a r.v. with true distribution p . Then (as we will see) we could represent that r.v. with a code that has average length H ( p ). However, due to incomplete information we do not know p ; instead we assume that the distribution of the r.v. is q . Then (as we will see) the code would need more bits to represent the r.v. The difference in the number of bits is denoted as D ( p |q). The quantity D ( p |q) comes up often enough that it has a name: it is known as the relative entropy .


\begin{definition}
The {\bf relative entropy} or {\bf Kullback-Leibler distance...
...\frac{p(x)}{q(x)} = E_p \log
\frac{p(X)}{q(X)}
\end{displaymath}\end{definition}

Note that this is not symmetric, and the q (the second argument) appears only in the denominator.

Another important concept is that of mutual information . How much information does one random variable tell about another one. In fact, this perhaps the central idea in much of information theory. When we look at the output of a channel, we see the outcomes of a r.v. What we want to know is what went into the channel -- we want to know what was sent, and the only thing we have is what came out. The channel coding theorem (which is one of the high points we are trying to reach in the class) is basically a statement about mutual information.


\begin{definition}
Let $X$\ and $Y$\ be r.v.s with joint distribution $p(X,Y)$\...
...(x,y) \log\frac{p(x,y)}{p(x)p(y)}
\end{aligned}\end{displaymath}\end{definition}

Note that when X and Y are independent, p ( x , y ) = p ( x ) p ( y ) (definition of independence), so I ( X ; Y ) = 0. This makes sense: if they are independent random variables then Y can tell us nothing about X .

An important interpretation of mutual information comes from the following.
\begin{theorem}
$I(X;Y) = H(X) - H(X\vert Y)$\end{theorem}

Interpretation: The information that Y tells us about X is the reduction in uncertainty about X due to the knowledge of Y .


\begin{proof}
\begin{displaymath}\begin{aligned}
I(X;Y) &= \sum_{x,y} p(x,y) \l...
... p(x\vert y) \\
&= H(X) - H(X\vert Y)
\end{aligned}\end{displaymath}\end{proof}

Observe that by symmetry

I ( X ; Y ) = H ( Y ) - H ( Y |X) = I ( Y ; X ).

That is, Y tells as much about X as X tells about Y . Using H ( X , Y ) = H ( X ) + H ( Y |X) we get

I ( X ; Y ) = H ( X ) + H ( Y ) - H ( X , Y )

The information that X tells about Y is the uncertainty in X plus the uncertainty about Y minus the uncertainty in both X and Y . We can summarize a bunch of statements about entropy as follows:

\begin{displaymath}
\boxed{\begin{aligned}
I(X;Y) &= H(X) - H(X\vert Y) \\
I(X;...
... H(X,Y) \\
I(X;Y) &= I(Y;X) \\
I(X;X) &= H(X)
\end{aligned}}
\end{displaymath}
Copyright 2008, by the Contributing Authors. Cite/attribute Resource . admin. (2006, May 17). Definitions and Basic Facts. Retrieved January 07, 2011, from Free Online Course Materials — USU OpenCourseWare Web site: http://ocw.usu.edu/Electrical_and_Computer_Engineering/Information_Theory/lecture2_3.htm. This work is licensed under a Creative Commons License Creative Commons License