Personal tools
You are here: Home Electrical and Computer Engineering Information Theory Arithmetic Coding

Arithmetic Coding

Document Actions
  • Content View
  • Bookmarks
  • CourseFeed

Introduction   ::   Probability Models   ::   Applications


Arithmetic coding overcomes some of the problems of Huffman coding, in particular the potential 1 bit surplus problem. It operates as a human might, using information already observed to predict what might be coming, and coding based on the prediction. In addition, the technique explicitly separates the prediction portion from the encoding portion. In AC, a bit sequence is interpreted as an interval on the real line from 0 to 1. For example 01 is interpreted as 0.01...., which corresponds (not knowing what the following digits are) to the interval [0.01, 0.10) (in binary) which is [0.25,0.5) (bse ten). (Make sure understanding on brackets.) A longer string 01101 corresponds to the interval [0.01101, 0.01110). The longer the string, the shorter the interval represented on the real line. Assume we are dealing with an alphabet $\Ac = \{a_1, \ldots, a_I\}$ , where a I is a special symbol meaning "end of transmission.'' The source produces the sequence $x_1, x_2, \ldots, x_n, \ldots$ , and not necessarily i.i.d. We further assume (or model) that there is a predictor which computes, or estimates


\begin{displaymath}P(x_n = a_i\vert x_1,x_2,\ldots, x_{n-1}),


which is available at both encoder and decoder. We divide the segment [0,1) into I intervals whose lengths are equal to the probabilities $P(x_1 = a_i), i=1,2,\ldots, I$ . The first interval is


[0, P ( x 1 = a 1 ))


The second interval is


[ P ( x 1 = a 1 ), P ( x 1 = a 1 )+ P ( x 1 = a 2 )),


and so forth. More generally, to provide for the possibility of considering other symbols than just x 1 , we define the lower and upper cumulative probabilities:


\begin{displaymath}Q_n(a_i\vert x_1,\ldots, x_{n-1}) = \sum_{j=1}^{i-1} P(x_n =
a_j\vert x_1,\ldots, x_{n-1})



\begin{displaymath}R_n(a_i\vert x_1,\ldots, x_{n-1}) = \sum_{j=1}^{i} P(x_n =
a_j\vert x_1,\ldots, x_{n-1})


Then, for example, a 2 corresponds to the interval [ Q 1 ( a 2 ), R 1 ( a 2 )). Now we represent the probabilities for the next symbol. Take, for example, the interval for a 1 , and subdivide it into intervals $a_1a_1, a_1a_2, \ldots, a_1a_I$ , so that the length of the interval for a 1 a j is proportional to P ( a j |a 1 ). In fact, we take the length of the subinterval for a 1 a j to be


P ( x 1 = a 1 , x 2 = a j ) = P ( x 1 = a 1 ) P ( x 2 = a j |x 1 = a 1 )


Then we note that the sum of the lengths of these subintervals will be


\begin{displaymath}\sum_j P(x_1=a_1,x_2 = a_j) = P(x_1=a_1),


which sure enough is the correct length. More generally, we subdivide each of the intervals for a i a j similarly to have length of


P ( x 1 = a i , x 2 = a j ) = P ( x 1 = a 1 ) P ( x 2 = a j |a 1 = a i ).


Then, we continue subdividing each subinterval for strings of length N . The following algorithm (Mackay, p. 151) shows how to compute the interval [ u , v ) for the string $x_1x_2, \ldots, x_N$ . (Note: this is for demonstration purposes, since it requires infinite precision arithmetic. In practice, the algorithm is arranged so that infinite precision is not required.)

\begin{progenv}{Arithmetic coding}{}{}{ac}{}
$u ...
...all compute new length = new probability}} \\

In encoding, the interval is subdivided for each new symbol. To encode the string $x_1,x_2,\ldots, x_N$ , we send the binary string whose interval lies within the interval determined by the sequence.

(Mackay, p. 151) Suppose a bent coin with outcomes {\tt a} and ...
...reached the decoder knows that the end of
file has been reached.

One of the benefits of arithmetic coding is that the worst case redundancy for an entire bit string (which may, for example, consist of an entire file) is at most two bits , assuming the probabilistic model is correct. Given a probabilistic model $\Hc$ , the ideal message length for a sequence $\xbf$ is $l(\xbf\vert\Hc) =
-\log[P(\xbf\vert\Hc)]$ . Suppose that $P(\xbf\vert\Hc)$ is just barely between two binary intervals. Then the next smaller binary intervals contained in $P(\xbf\vert\Hc)$ are smaller by a factor of 4. This factor of 4 corresponds to $\log_2 4 = 2$ bits overhead worst case.

Copyright 2008, Todd Moon. Cite/attribute Resource . admin. (2006, May 15). Arithmetic Coding. Retrieved January 07, 2011, from Free Online Course Materials — USU OpenCourseWare Web site: This work is licensed under a Creative Commons License Creative Commons License