Personal tools

Data Compression

Document Actions
  • Content View
  • Bookmarks
  • CourseFeed

Introduction   ::   Kraft   ::   Optimal Codes   ::   Bounds   ::   Huffman   ::   Coding

Huffman codes

Huffman codes are the optimal prefix codes for a given distribution. What's more, if we know the distribution, Huffman codes are easy to find. The code operates from the premise of assigning longer codewords to less-likely symbols, and doing it in a tree-structured way so that the codes obtained are prefix-free.
Consider the distribution $X$\ taking values in the set $\Xc =
... assign codewords on the tree. The average codelength is 2.3 bits.

Codes with more than D =2 symbols can also be built, as described in the book.

Proving the optimality of the Huffman code begins with the following simple lemma:
For any distribution, there exists an optimal instantaneous code
..., and
correspond to the two least-likely symbols.

\item Simply swap lengths.
\item If t... code, which
contradicts the optimality property.

For a code on m symbols, assume (w.o.l.o.g.) that the probabilities are ordered $p_1> p_2 > \cdots > p_m$ . Define the merged code on m -1 symbols by merging the two least probable symbols p m , p m -1 . The codeword on this merged symbol is the common prefix on the two least-probable (longest) codewords, which, by the lemma, exists. The expected length of the code C m is

L(C_m) &= \sum_{i=1}^m p_i l_i \\
&= \sum_{i...
...+ p_m(l_{m}+1) \\
&= L(C_{m-1}) + p_m(l_{m}'+1).

The optimization problem on m symbols has been reduced to an optimization problem on m -1 symbols. Proceeding inductively, we get down to two symbols, for which the optimal code is obvious: 0 or 1.

Copyright 2008, by the Contributing Authors. Cite/attribute Resource . admin. (2006, May 17). Data Compression. Retrieved January 07, 2011, from Free Online Course Materials — USU OpenCourseWare Web site: This work is licensed under a Creative Commons License Creative Commons License