# Data Compression

Introduction :: Kraft :: Optimal Codes :: Bounds :: Huffman :: Coding

## Bounds on the optimal code

The theorem just proved shows that the length must be greater than
*
H
*
_{
D
}
(
*
X
*
). We can now prove that a physically implementable instantaneous code (that is, a code with integer codeword lengths), we can find an upper bound on the code length:

That is, the overhead due to the integer codeword length it not more than one bit.

The codeword lengths are found by

where is the smallest integer . These codeword lengths satisfy the Kraft inequality:

The codewords lengths satisfy

Taking the expectation through we get

The next trick is to reduce the overhead (of up to one bit) by spreading it over several symbols. Suppose we are sending a sequence of symbols, drawn independently according to the distribution
*
p
*
(
*
x
*
). A sequence of
*
n
*
symbols can be regarded as a symbol from the alphabet
.

Let
*
L
*
_{
n
}
be the expected codeword length per input symbol:

Applying the inequality to the codelengths:

Since the symbols are i.i.d., . Dividing through by

*n*we obtain

By choosing the block size sufficiently large, the average code length can be made arbitrarily close to the entropy.

The next observation is that if the symbols are
*
not
*
independent, we can still write

Dividing through by

*n*we obtain

If

*X*is a stationary stochastic process, then taking the limit yields

Another question is what if the distribution used to design the codes is not the same as the actual distribution? Consider the code designed by
, for the distribution
*
q
*
(
*
x
*
), while the true distribution is
*
p
*
(
*
x
*
).

That is, the mistaken distribution costs us an extra
*
D
*
(
*
p
*
|q) bits per symbol to code.