Personal tools
  •  
You are here: Home Electrical and Computer Engineering Information Theory Maximum Entropy Estimation

Maximum Entropy Estimation

Document Actions
  • Content View
  • Bookmarks
  • CourseFeed

Entropy Estimation   ::   Spectrum Estimation

Spectrum estimation

A problem of ongoing interest in signal processing is to estimate the spectrum of a signal, given its samples (which are often noisy). A large variety of techniques have been developed for this purpose. If the autocorrelation function

 

R ( k ) = EX i X i + k

 

is known for all k , then the spectrum (more strictly, the power spectral density) can be computed as the Fourier transform of the autocorrelation function:

 

 \begin{displaymath}
S(\omega) = \sum_{m=-\infty}^\infty R(m) e^{-jm\omega}\qquad -\pi
< \omega \leq \pi.
\end{displaymath}

 

In practice, we observe only n samples and can only estimate the autocorrelation values by an estimator such as

 

 \begin{displaymath}
\Rhat(k) = \frac{1}{n-k}\sum_{i=1}^{n-k} X_i X_{i+k}.
\end{displaymath}

 

This is the periodogram method, and it does not converge to the true power spectrum for large n . At large values of k (lags), the estimate has only a few samples to deal with. The inaccuracies can be covered by setting autocorrelations at large lag to zero. However, this abrupt change introduces spectral artifacts. The autocorrelation function could also be windowed, but that can lead to negative power spectrum estimates.

Instead of setting the values to zero, one suggested approach is to set them to values that make the fewest assumptions about the data , i.e., which maximize the entropy rate of the process. If the data are assumed to be stationary and Gaussian, this corresponds (as we will see) to an AR process. This approach (due originally to Burg) is of wide application. The model-estimation approach that arises is commonly used, for example, for efficient coding of speech parameters.

We first need to look at the entropy rate of a Gaussian process.

\begin{definition}
The {\bf differential entropy rate} of a stochastic process ...
...ts,
X_n)}{n},
\end{displaymath}provided that the limit exists.
\end{definition}
\begin{displaymath}
h(\Xc) = \lim_{n\rightarrow \infty} h(X_n|X_{n-1},\ldots,X_1).
\end{displaymath}

 

For a stationary Gaussian process with covariance K we have

 

 \begin{displaymath}
h(X_1,\ldots,X_n) = \frac{1}{2}\log (2\pi e)^n |K^{(n)}|
\end{displaymath}

 

where K ( n ) is the Toeplitz covariance matrix with entries $R(0),
R(1),\ldots, R(n-1)$ along the top row, and K ij ( n ) = R (|i- j |). As $n\rightarrow \infty$ the density of the eigenvalues of the matrix tends to a limit (Szego's theorem), which is the spectrum of the stochastic process. It has been shown (Kolmogorov) that the entropy rate of a stationary Gaussian stochastic process can be expressed as

 

 \begin{displaymath}
h(\Xc) = \frac{1}{2}\log 2\pi e + \frac{1}{4\pi}\int_{-\pi}^{\pi}
\log S(\lambda) d\lambda.
\end{displaymath}

 

Using the formulation $h(\Xc) = \lim_{n} h(X_n\vert X^{n-1})$ , and the fact that a Gaussian conditioned on Gaussians is Gaussian, we have that $h(\Xc)$ must be the entropy of some Gaussian distribution with entropy $\frac{1}{2} \log 2\pi e \sigma^2_\infty$ , where $\sigma^2_\infty$ is the variance in the error of the best estimate of X n given the infinite past.

We can now present Burg's result.
\begin{theorem}
The maximum entropy rate stochastic process $\{X_i\}$ satisfyin...
...and $\sigma^2$\ are chosen to satisfy the correlation constraints.
\end{theorem}
Note: we have not assumed that X i is Gaussian, zero-mean, nor stationary.


\begin{proof}
Let $X_1,\ldots,X_n$\ be a stochastic process with the given
corre...
...stic process satisfying the constraints is
the Gauss-markov process.
\end{proof}
Summary: the entropy of a finite segment of a stochastic process is bounded above by the entropy of a Gaussian process with the same covariance, which in turn is bounded above by the variance of a minimal order Gausss-Markov process with the given covariance constraints.

Now, how do we select the parameters $a_1,\ldots,a_p$ and $\sigma^2$ . Multiply the

 

 \begin{displaymath}
X_i = -\sum_{k=1}^p a_k X_{i-k} + Z_i
\end{displaymath}

 

X i - l and take expectations:

 

 \begin{displaymath}
R(0) = -\sum_{k=1}^p a_k R(k) + \sigma^2
\end{displaymath}
\begin{displaymath}
R(l) = -\sum_{k=1}^p a_k R(l-k),\qquad l=1,2,\ldots,
\end{displaymath}

 

This gives rise to the Yule-Walker equations.

Having determined the values of a i the spectrum is

 

 \begin{displaymath}
S(\omega) = \frac{\sigma^2}{|1+\sum_{k=1}^p a_k e^{-jk\omega}|^2}.
\end{displaymath}

Show this!

Copyright 2008, by the Contributing Authors. Cite/attribute Resource . admin. (2006, May 17). Maximum Entropy Estimation. Retrieved January 07, 2011, from Free Online Course Materials — USU OpenCourseWare Web site: http://ocw.usu.edu/Electrical_and_Computer_Engineering/Information_Theory/lecture13_1.htm. This work is licensed under a Creative Commons License Creative Commons License