Personal tools
  •  

Data Compression

Document Actions
  • Content View
  • Bookmarks
  • CourseFeed

Introduction   ::   Kraft   ::   Optimal Codes   ::   Bounds   ::   Huffman   ::   Coding

Introduction

We will apply what we know of entropy to the problem of data compression. We will introduce and prove the important Kraft inequality, Shannon codes, and Huffman codes.

We are now ready to use the tools we have been building over the last few weeks to work on the problem of efficient representation of data: data compression. In order the made usable coding representations, we introduce a type of codes known as instantaneous codes, which can be decoded without any backtracking. We present the Kraft inequality, which is an important result on the lengths of codewords. Then we show how to achieve a lower bound and introduce Huffman coding.

Some Simple Codes


\begin{definition}
A {\bf source code} $C$\ for a random variable $X$\ is a map...
...rresponding to $x$ and let $l(x)$\ denote the length of $C(x)$.
\end{definition}


\begin{example}
Suppose that $D=2$, $\Dc = \{0,1\}$, and $\Xc =
\{\text{red,bl...
... be coded as 010110. Can this be uniquely
decoded at the receiver?
\end{example}

\begin{example}
Let $X$\ be a r.v. with the following distribution and coding
...
...entropy. Also note that we can uniquely decode
a sequence of bits.
\end{example}

\begin{example}
Now let the code be assigned as
\begin{itemize}
\item $P(X=1)...
...mize}In this case, we cannot distinguish between $X=1$\ and $X=2$.
\end{example}

\begin{definition}
A code is said to be {\bf non-singular} is every element of ...
...\Dc^*$. That is, if
$x_j \neq x_i$\ then $C(x_i) \neq C(x_j)$.
\end{definition}
The last example is a code that is a singular code.

We have met the idea of stringing together a bunch of codes in succession. This has a definition:
\begin{definition}
An {\bf extension} $C^*$\ of a code $C$\ is a mapping from f...
...math}where the RHS is the {\bf concatenation} of the codewords.
\end{definition}

\begin{example}
If $C(x_1) = 00$\ and $C(x_2)= 11$, then $C(x_1x_2) = 0011$.
\end{example}

\begin{definition}
A code is called {\bf uniquely decodable} if its extension is
uniquely decodable.
\end{definition}
That is, if we string together a bunch of codewords, we want to be able to tell where one codeword leaves off and another begins. The first example code presented is not uniquely decodable.

There may be codes which are uniquely decodable, but in order to do the decoding, the decoder may have to do some look-ahead and some backtracking in order to come up with a unique sequence. In practice, this means the decoding hardware is more complicated, and these kinds of codes are avoided where possible.
\begin{definition}
A code is called a {\bf prefix code} or an {\bf instantaneous code}
if no codeword is a prefix of any other codeword.
\end{definition}
An instantaneous codeword can be decoded without look-ahead, since the end of a codeword is immediately recognizable (it is not the beginning of any other codeword). Instantaneous codes are ``self-punctuating.''
\begin{example}
The table below illustrates three different codes assigned to t...
...th of the string of 0s is
even, the first source symbol must be 3.
\end{example}

Copyright 2008, Todd Moon. Cite/attribute Resource . admin. (2006, May 15). Data Compression. Retrieved January 07, 2011, from Free Online Course Materials — USU OpenCourseWare Web site: http://ocw.usu.edu/Electrical_and_Computer_Engineering/Information_Theory/lecture6_1.htm. This work is licensed under a Creative Commons License Creative Commons License