The Asymptotic Equipartition Property
Introduction :: Convergence :: Data Compression :: Sets
Convergence in probability
Before jumping intro the theorem we need to discuss what it means to converge in probability . There are a variety of ways that things can converge in the world of probability. There is convergence in distribution (as exhibited by the central limit theorem), there is convergence almost surely (which is very strong), there is convergence in mean square , and there is convergence in probability . We could spend about three weeks working through what each of these mean, and which implies the other. However, for the moment we will simply focus on convergence in probability.
Recalling the definition of convergence of sequences, we can say this
as follows: For any
and for any
,
there is
an
n
_{
0
}
such that
for all n > n _{ 0 } .
One way of quantifying how we are doing on the convergence is by means
of Markov's inequality: For a positive r.v. and any
,
From this we can derive the Chebyshev inequality : for a r.v. Y with mean and variance
From this we can show convergence of the sample mean (the WLLN).
Now the AEP:
The set of sequences that come up most often (according the
probability law of the r.v.) are
typical sequences
. The
typicality is defined as follows:
So the typical sequences occur with a probability that is in the
neighborhood of 2
^{

nH
}
.
The typical set
has the
following properties:
Interpretations: By (1), nearly all of the elements in the typical
set are nearly equiprobable. By (2), the typical set occurs with
probability near 1 (that is why it is typical!). By (3) and (4), the
number of elements in the typical set is nearly 2
^{
nH
}
.