|
|
Personal tools |
|
Arithmetic CodingIntroduction :: Probability Models :: Applications Introduction Arithmetic coding overcomes some of the problems of Huffman coding, in particular the potential 1 bit surplus problem. It operates as a human might, using information already observed to predict what might be coming, and coding based on the prediction. In addition, the technique explicitly separates the prediction portion from the encoding portion. In AC, a bit sequence is interpreted as an interval on the real line from 0 to 1. For example 01 is interpreted as 0.01...., which corresponds (not knowing what the following digits are) to the interval [0.01, 0.10) (in binary) which is [0.25,0.5) (bse ten). (Make sure understanding on brackets.) A longer string 01101 corresponds to the interval [0.01101, 0.01110). The longer the string, the shorter the interval represented on the real line. Assume we are dealing with an alphabet
which is available at both encoder and decoder. We divide the segment [0,1) into I intervals whose lengths are equal to the probabilities
[0,P(x1 = a1))
The second interval is
[P(x1 = a1),P(x1 = a1)+P(x1 = a2)),
and so forth. More generally, to provide for the possibility of considering other symbols than just x1, we define the lower and upper cumulative probabilities:
Then, for example, a2 corresponds to the interval [Q1(a2),R1(a2)). Now we represent the probabilities for the next symbol. Take, for example, the interval for a1, and subdivide it into intervals
P(x1 = a1, x2 = aj) = P(x1 = a1) P(x2 = aj|x1 = a1)
Then we note that the sum of the lengths of these subintervals will be
which sure enough is the correct length. More generally, we subdivide each of the intervals for ai aj similarly to have length of
P(x1 = ai, x2 = aj) = P(x1 = a1)P(x2 = aj|a1 = ai).
Then, we continue subdividing each subinterval for strings of length N. The following algorithm (Mackay, p. 151) shows how to compute the interval [u,v) for the string In encoding, the interval is subdivided for each new symbol. To encode the string One of the benefits of arithmetic coding is that the worst case redundancy for an entire bit string (which may, for example, consist of an entire file) is at most two bits, assuming the probabilistic model is correct. Given a probabilistic model
Copyright 2008,
Todd Moon.
Cite/attribute Resource.
admin. (2006, May 15). Arithmetic Coding. Retrieved November 23, 2009, from Free Online Course Materials — USU OpenCourseWare Web site: http://ocw.usu.edu/Electrical_and_Computer_Engineering/Information_Theory/lecture7_1.htm.
This work is licensed under a
Creative Commons License.
|
||
