Personal tools

Data Compression

Document Actions
  • Content View
  • Bookmarks
  • CourseFeed

Introduction   ::   Kraft   ::   Optimal Codes   ::   Bounds   ::   Huffman   ::   Coding

Bounds on the optimal code

The theorem just proved shows that the length must be greater than H D ( X ). We can now prove that a physically implementable instantaneous code (that is, a code with integer codeword lengths), we can find an upper bound on the code length:


\begin{displaymath}H(X) \leq L < H(X)+1.\end{displaymath}


That is, the overhead due to the integer codeword length it not more than one bit.

The codeword lengths are found by


\begin{displaymath}l_i = \lceil \log_D\left(\frac{1}{p_i}\right)\rceil


where $\lceil x \rceil$ is the smallest integer $\geq x$ . These codeword lengths satisfy the Kraft inequality:


\begin{displaymath}\sum_i D^{-l_i} = \sum_i D^{-\lceil \log \frac{1}{p_i}\rceil}
\leq \sum_{i} D^{-\log\frac{1}{p_i}} =\sum_i p_i = 1

The codewords lengths satisfy

\begin{displaymath}\log_D \frac{1}{p_i} \leq l_i < \log_d \frac{1}{p_i}+1

Taking the expectation through we get

\begin{displaymath}H_D(X) \leq L < H_D(X)+1.

The next trick is to reduce the overhead (of up to one bit) by spreading it over several symbols. Suppose we are sending a sequence of symbols, drawn independently according to the distribution p ( x ). A sequence of n symbols can be regarded as a symbol from the alphabet $\Xc^n$ .

Let L n be the expected codeword length per input symbol:


\begin{displaymath}L_n = \sum_{\xbf \in \Xc^n} p(x_1,x_2,\ldots,x_n)
l(x_1,x_2,\ldots,x_n) = \frac{1}{n} E(X_1,X_2,\ldots,X_n).


Applying the inequality to the codelengths:


\begin{displaymath}H(X_1,X_2,\ldots,X_n) \leq El(X_1,X_2,\ldots,X_n) \leq
H(X_1,X_2,\ldots,X_n) + 1


Since the symbols are i.i.d., $H(X_1,X_2,\ldots,X_n) = nH(X_1)$ . Dividing through by n we obtain


\begin{displaymath}H(X) \leq L_n \leq H(X) + \frac{1}{n}.\end{displaymath}


By choosing the block size sufficiently large, the average code length can be made arbitrarily close to the entropy.

The next observation is that if the symbols are not independent, we can still write


\begin{displaymath}H(X_1,X_2,\ldots,X_n) \leq El(X_1,X_2,\ldots,X_n) \leq
H(X_1,X_2,\ldots,X_n) + 1.


Dividing through by n we obtain


\begin{displaymath}\frac{H(X_1,X_2,\ldots,X_n)}{n} \leq L_n \leq
\frac{H(X_1,X_2,\ldots,X_n)}{n} + \frac{1}{n}.


If X is a stationary stochastic process, then taking the limit yields


\begin{displaymath}L_n \rightarrow H(\Xc).


Another question is what if the distribution used to design the codes is not the same as the actual distribution? Consider the code designed by $l(x) = \lceil \frac{1}{q(x)} \rceil$ , for the distribution q ( x ), while the true distribution is p ( x ).
The expected length under $p(x)$\ of the code designed under $l...
...ert q) \leq E_p l(X) \leq H(p) + D(p\Vert q) + 1.
That is, the mistaken distribution costs us an extra D ( p |q) bits per symbol to code.

El(X) &= \sum_x p(x) \lceil \lo...
... H(p) + 1.
\end{aligned}\end{displaymath}The lower bound is similar.

Copyright 2008, by the Contributing Authors. Cite/attribute Resource . admin. (2006, May 17). Data Compression. Retrieved January 07, 2011, from Free Online Course Materials — USU OpenCourseWare Web site: This work is licensed under a Creative Commons License Creative Commons License