Personal tools
You are here: Home Electrical and Computer Engineering Information Theory Arithmetic Coding

Arithmetic Coding

Document Actions
  • Content View
  • Bookmarks
  • CourseFeed

Introduction   ::   Probability Models   ::   Applications

Probability models

Performance of the AC depends on having a good model for the source probabilities. The better the model, the better it might be expected that the code performs. In principle, any probabilistic model can be used. We mention here some useful concepts in developing one.

Suppose, as before, we deal with the case of independent events. We have outcomes a , b , and $\square$ , with probabilities $p_{\tt a}, p_{\tt b}$ and $p_{\mbox{$\square$}}$ . Let l be the number of outcomes (number of coin tosses). $p_{\tt a}$ could be anywhere in the range [0,1], and we may not have any predisposition toward one value. We model this ambivalence by saying that


\begin{displaymath}P(p_{\tt a}) = 1 \qquad \text{for } p_{\tt a}\in [0,1].


That is, it is uniformly distributed. This is a prior probability . If we had some predisposition about $p_{\tt a}$ , this could be incorporated into the prior model (using something like a $\beta$ distribution, for example). The whole point of Bayesian estimation (which is what we find we are talking about here) is to merge our prior inclinations in with the observations. This is a problem of inference, which we can state this way: given a sequence of F bits, of which $F_{\tt a}$ are a s and $F_{\tt b}$ are b s, infer $p_{\tt a}$ . The inference is accomplished by the posterior ("after'') -- the probability of $p_{\tt a}$ after a measurement $\sbf$ is made. We write


\begin{displaymath}P(p_{\tt a}\vert\sbf,F) = \frac{P(\sbf\vert p_{\tt a},F)P(p_{\tt a})}{P(\sbf\vert F)}.

Now why this? Well, we can write down the conditional probability in the numerator:


\begin{displaymath}P(\sbf\vert p_{\tt a},F) = p_{\tt a}^{F_{\tt a}}(1-p_{\tt a})^{F_{\tt a}}


(describe why). As we have seen elsewhere, it seems that the conditioning is always easiest they way you don't need it. We also find


\begin{displaymath}P(\sbf\vert F) = \int P(\sbf\vert p_{\tt a},F)P(p_{\tt a}) dp...
... 2)}
= \frac{F_{\tt a}! F_{\tt b}!}{F_{\tt a}+F_{\tt b}+1)!}


So we could infer $p_{\tt a}$ as the most probable value (the maximizer) of the posterior. For example, we find $P(p_{\tt a}\vert\sbf =
{\tt a}{\tt b}{\tt a},F=3) \propto p_{\tt a}^2(1-p_{\tt a})$ , with maximum of $p_{\tt a}
= 2/3$ . Or we could infer based on the mean, which is 3/5. We also want to be able to make predictions. Given a sequence $\sbf$ of length F as evidence we find the prediction of drawing an a as


\begin{displaymath}P({\tt a}\vert\sbf,F) = \int P({\tt a}\vert p_att) P(p_a\vert\sbf,F) dp_{\tt a}.


Note that in this case, we are using the entire posterior probability, so we incorporate all of our uncertainty about p a . We also have $P({\tt a}\vert p_{\tt a}) = p_{\tt a}$ (by its definition), so our predictor is


\begin{displaymath}P({\tt a}\vert\sbf,F) = \int p_a
\frac{p_{\tt a}^{F_{\tt a}}...
...F)} dp_{\tt a}=
\frac{F_{\tt a}+1}{F_{\tt a}+ F_{\tt b}+ 2}.


This update rule is known as Laplace's rule, and is the rule that was used in the coder above. We could write this as


\begin{displaymath}P_L(a\vert x_1,\ldots, x_{n-1}) = \frac{F_{\tt a}+1}{\sum_{i} (F_i + 1)}


Another model, known as the Dirichlet model, is more "responsive'':


\begin{displaymath}P_D(a\vert x_1,\ldots, x_{n-1}) = \frac{F_{\tt a}+\alpha}{\sum_{i} (F_i + \alpha)}

Typically, $\alpha$ is small, like 0.01.
This is not the only possible rule, and doesn't necessarily take into account the relationship that might exist between dependent variables.

Copyright 2008, by the Contributing Authors. Cite/attribute Resource . admin. (2006, May 17). Arithmetic Coding. Retrieved January 07, 2011, from Free Online Course Materials — USU OpenCourseWare Web site: This work is licensed under a Creative Commons License Creative Commons License