# Arithmetic Coding

Introduction :: Probability Models :: Applications

## Introduction

Arithmetic coding overcomes some of the problems of Huffman coding, in particular the potential 1 bit surplus problem. It operates as a human might, using information already observed to predict what might be coming, and coding based on the prediction. In addition, the technique explicitly separates the prediction portion from the encoding portion. In AC, a bit sequence is interpreted as an interval on the real line from 0 to 1. For example
`
01
`
is interpreted as 0.01...., which corresponds (not knowing what the following digits are) to the interval [0.01, 0.10) (in binary) which is [0.25,0.5) (bse ten). (Make sure understanding on brackets.) A longer string
`
01101
`
corresponds to the interval [0.01101, 0.01110). The longer the string, the shorter the interval represented on the real line. Assume we are dealing with an alphabet
, where
*
a
*
_{
I
}
is a special symbol meaning "end of transmission.'' The source produces the sequence
, and not necessarily i.i.d. We further assume (or model) that there is a predictor which computes, or estimates

which is available at both encoder and decoder. We divide the segment [0,1) into
*
I
*
intervals whose lengths are equal to the probabilities
. The first interval is

*P*(

*x*

_{ 1 }=

*a*

_{ 1 }))

The second interval is

*P*(

*x*

_{ 1 }=

*a*

_{ 1 }),

*P*(

*x*

_{ 1 }=

*a*

_{ 1 })+

*P*(

*x*

_{ 1 }=

*a*

_{ 2 })),

and so forth. More generally, to provide for the possibility of considering other symbols than just
*
x
*
_{
1
}
, we define the lower and upper cumulative probabilities:

Then, for example,

*a*

_{ 2 }corresponds to the interval [

*Q*

_{ 1 }(

*a*

_{ 2 }),

*R*

_{ 1 }(

*a*

_{ 2 })). Now we represent the probabilities for the next symbol. Take, for example, the interval for

*a*

_{ 1 }, and subdivide it into intervals , so that the length of the interval for

*a*

_{ 1 }

*a*

_{ j }is

*proportional*to

*P*(

*a*

_{ j }|a

_{ 1 }). In fact, we take the length of the subinterval for

*a*

_{ 1 }

*a*

_{ j }to be

*P*(

*x*

_{ 1 }=

*a*

_{ 1 },

*x*

_{ 2 }=

*a*

_{ j }) =

*P*(

*x*

_{ 1 }=

*a*

_{ 1 })

*P*(

*x*

_{ 2 }=

*a*

_{ j }|x

_{ 1 }=

*a*

_{ 1 })

Then we note that the sum of the lengths of these subintervals will be

which sure enough is the correct length. More generally, we subdivide each of the intervals for

*a*

_{ i }

*a*

_{ j }similarly to have length of

*P*(

*x*

_{ 1 }=

*a*

_{ i },

*x*

_{ 2 }=

*a*

_{ j }) =

*P*(

*x*

_{ 1 }=

*a*

_{ 1 })

*P*(

*x*

_{ 2 }=

*a*

_{ j }|a

_{ 1 }=

*a*

_{ i }).

Then, we continue subdividing each subinterval for strings of length
*
N
*
. The following algorithm (Mackay, p. 151) shows how to compute the interval [
*
u
*
,
*
v
*
) for the string
. (Note: this is for demonstration purposes, since it requires infinite precision arithmetic. In practice, the algorithm is arranged so that infinite precision is not required.)

In encoding, the interval is subdivided for each new symbol. To encode the string
, we send the binary string whose interval lies
*
within
*
the interval determined by the sequence.

One of the benefits of arithmetic coding is that the worst case redundancy
*
for an entire bit string
*
(which may, for example, consist of an entire file) is
**
at most two bits
**
, assuming the probabilistic model is correct. Given a probabilistic model
, the ideal message length for a sequence
is
. Suppose that
is
*
just barely
*
between two binary intervals. Then the next smaller binary intervals contained in
are smaller by a factor of 4. This factor of 4 corresponds to
bits overhead worst case.