Data Compression
Introduction :: Kraft :: Optimal Codes :: Bounds :: Huffman :: Coding
Huffman codes
Huffman codes are the optimal prefix codes for a given distribution.
What's more, if we know the distribution, Huffman codes are
easy to find. The code operates from the premise of assigning longer
codewords to less-likely symbols, and doing it in a tree-structured
way so that the codes obtained are prefix-free.
Codes with more than D=2 symbols can also be built, as described in the book.
Proving the optimality of the Huffman code begins with the following
simple lemma:
For a code on m symbols, assume (w.o.l.o.g.) that the probabilities
are ordered
p_2 > \cdots > p_m$" align="middle" border="0" height="28" width="136" />.
Define the merged code on
m-1 symbols by merging the two least probable symbols pm,
pm-1. The codeword on this merged symbol is the common prefix on
the two least-probable (longest) codewords, which, by the lemma,
exists. The expected length of the code Cm is
The optimization problem on m symbols has been reduced to an optimization problem on m-1 symbols. Proceeding inductively, we get down to two symbols, for which the optimal code is obvious: 0 or 1.


















