# Maximum Entropy Estimation

Entropy Estimation :: Spectrum Estimation

## Spectrum estimation

A problem of ongoing interest in signal processing is to estimate the spectrum of a signal, given its samples (which are often noisy). A large variety of techniques have been developed for this purpose. If the autocorrelation function

*R*(

*k*) =

*EX*

_{ i }

*X*

_{ i + k }

is known for all
*
k
*
, then the spectrum (more strictly, the power spectral density) can be computed as the Fourier transform of the autocorrelation function:

In practice, we observe only

*n*samples and can only estimate the autocorrelation values by an estimator such as

This is the
*
periodogram
*
method, and it does not converge to the true power spectrum for large
*
n
*
. At large values of
*
k
*
(lags), the estimate has only a few samples to deal with. The inaccuracies can be covered by setting autocorrelations at large lag to zero. However, this abrupt change introduces spectral artifacts. The autocorrelation function could also be windowed, but that can lead to negative power spectrum estimates.

Instead of setting the values to zero, one suggested approach is to set them to values that make the
*
fewest assumptions about the data
*
, i.e., which maximize the entropy rate of the process. If the data are assumed to be stationary and Gaussian, this corresponds (as we will see) to an AR process. This approach (due originally to Burg) is of wide application. The model-estimation approach that arises is commonly used, for example, for efficient coding of speech parameters.

We first need to look at the entropy rate of a Gaussian process.

For a stationary Gaussian process with covariance

*K*we have

where
*
K
*
^{
(
n
)
}
is the Toeplitz covariance matrix with entries
along the top row, and
*
K
*
_{
ij
}
^{
(
n
)
}
=
*
R
*
(|i-
*
j
*
|). As
the density of the eigenvalues of the matrix tends to a limit (Szego's theorem), which is the spectrum of the stochastic process. It has been shown (Kolmogorov) that the entropy rate of a stationary Gaussian stochastic process can be expressed as

Using the formulation
, and the fact that a Gaussian conditioned on Gaussians is Gaussian, we have that
must be the entropy of some Gaussian distribution with entropy
, where
is the variance in the error of the best estimate of
*
X
*
_{
n
}
given the infinite past.

We can now present Burg's result.

Note: we have not assumed that
*
X
*
_{
i
}
is Gaussian, zero-mean, nor stationary.

Summary: the entropy of a finite segment of a stochastic process is bounded above by the entropy of a Gaussian process with the same covariance, which in turn is bounded above by the variance of a minimal order Gausss-Markov process with the given covariance constraints.

Now, how do we select the parameters
and
. Multiply the

*X*

_{ i - l }and take expectations:

This gives rise to the Yule-Walker equations.

Having determined the values of
*
a
*
_{
i
}
the spectrum is

Show this!