Data Compression
Introduction :: Kraft :: Optimal Codes :: Bounds :: Huffman :: Coding
Bounds on the optimal code
The theorem just proved shows that the length must be greater than HD(X). We can now prove that a physically implementable instantaneous code (that is, a code with integer codeword lengths), we can find an upper bound on the code length:
That is, the overhead due to the integer codeword length it not more than one bit.
The codeword lengths are found by
where
is the smallest integer
. These codeword lengths satisfy the Kraft inequality:
The codewords lengths satisfy
![]()
Taking the expectation through we get
![]()
The next trick is to reduce the overhead (of up to one bit) by spreading it over several symbols. Suppose we are sending a sequence of symbols, drawn independently according to the distribution p(x). A sequence of n symbols can be regarded as a symbol from the alphabet
.
Let Ln be the expected codeword length per input symbol:
Applying the inequality to the codelengths:
Since the symbols are i.i.d.,
By choosing the block size sufficiently large, the average code length can be made arbitrarily close to the entropy.
The next observation is that if the symbols are not independent, we can still write
Dividing through by n we obtain
If X is a stationary stochastic process, then taking the limit yields
Another question is what if the distribution used to design the codes is not the same as the actual distribution? Consider the code designed by
, for the distribution q(x), while the true distribution is p(x).
That is, the mistaken distribution costs us an extra D(p|q) bits per symbol to code.


















