Personal tools
You are here: Home Electrical and Computer Engineering Information Theory Introduction to Information Theory

Introduction to Information Theory

Document Actions
  • Content View
  • Bookmarks
  • CourseFeed

Communications Model   ::   Fundamental Concept   ::   Channel Models

The fundamental concept

One of the key (and initially counter-intuitive) concepts in information theory is that information is conveyed by randomness . This is information as defined in some mathematical sense, which is not identical to that which humans use. For example, it is possible to measure the amount of information in a page of typewritten text. Due to the structure of the English language, the amount of information conveyed by each letter in a word is substantially less than the 7-bit ASCII representation used. (It is somewhere over 2 bits/letter usually). There would be more information conveyed (in the mathematical sense) if the letters were completely random, instead of structured into words.

On the other hand, it is not too difficult to make the connection between randomness and information. Consider the tossing of a coin: if you know the outcome of the coin toss before it is tossed, then learning the outcome does not give you any more information. If you have a biased coin that is heads 90% of the time, then you gain very little information when you learn it is heads. On the other hand, you gain a fair amount of information when it comes up tails; the information is thus related somewhat to the degree of "surprise'' at finding out the answer. Q: what weighting of the coin gives the maximum amount of information on the average ?

Another very important concept that we will say more about later is that of typical sequences . In a sequence of bits of lenght n , there are some sequences which are (in a sense to be made precise later) typical. For example, for a sequence of coin-tossing outcomes for a fair coin, such as HHTHHTHTT, we would expect the number of heads and tails to be approximately equal (since the coin is fair). For an unbiased coin, we would expect the proportion of heads to go with the bias. Sequences that do not follow this trend, such as HHHHHHHHH, are thus atypical . A good part of information theory is capturing this concept of typicallity as precisely as possible and using it to concluding how many bits are needed to represent sequences of data. The basic idea is to try to use bits to represent only the typical sequences, since the others don't come up very often. (Of course, when they do come up, you don't want to just throw them away.) This concept of typical sequences is what the asymptotic equipartition property is all about, which is the topic of Chapter 3.
Suppose we have a discrete random variable X , and x is some particular outcome what occurs with probability p ( x ). Then we assign to that event x the information that it conveys the uncertainty measure

\begin{displaymath}\text{uncertainty} = -\log p(x).\end{displaymath}

The base of the logarithms determines the units of information. If $\log_2$ is used, then the units are in bits . If $\log_e$ (natural log) is used, then the units are in nats . While nats are not as familiar to engineers, it sometimes makes the computations slightly easier. Q: how do you convert from bits to nats?

If a random variable has two outcomes, say $\zerobf$\ and $\one...
...ised by the occurrence of something that is
impossible to happen.

What is more commonly useful is the average uncertainty provided by a random variable X taking values in a space $\Xc$ .
The {\bf entropy} $H(x)$\ os a discrete random variable $X$\...
...aymath}H(X) = -\sum_{x \in \Xc} p(x) \log p(x).\end{displaymath}\end{definition}
The entropy of an r.v. is a measure of the uncertainty of the random variable. It is a measure of the amount of information required on the average to describe the random variable.

Take a fair coin:
\begin{displaymath}H(X) = -(0.5 \log 0.5 + 0....
....9 \log 0.9 + 0.1 \log 0.1) = 0.469 \text{ bits}.

What about a r.v. with three outcomes?

Notation: We shall use the operator E to denote expectation . If $X \sim p(x)$ (read as: X is distributed according to p ( x )), then for some function of the random variable g ( X ),

\begin{displaymath}E g(X) = \sum_{x\in \Xc} g(x)p(x).\end{displaymath}

(also known as the law of the unconcious statistician.) Recall EX , EX 2 , etc. Then for $g(x) = \log 1/p(x)$ ,

\begin{displaymath}H(X) = Eg(X) = E \log 1/p(x)

Copyright 2008, by the Contributing Authors. Cite/attribute Resource . admin. (2006, May 17). Introduction to Information Theory. Retrieved January 07, 2011, from Free Online Course Materials — USU OpenCourseWare Web site: This work is licensed under a Creative Commons License Creative Commons License