Linear Minimum MeanSquare Error Filtering
Background :: Filtering
Background
Recall that for random variable and with finite variance, the MSE is minimized by . That is, the best estimate of using a measured value of is to find the conditional average of . One aspect of this estimate is that:
The error is orthogonal to the data.More precisely, the error is orthogonal to and to every function of :
for all measurable functions . We will assume that .
We want to show that
minimizes
if and only if
(orthogonality), for all measurable
such
that
.
Conversely, suppose for some , . Consider the estimate
where
Then
Suppose now we are given two random processes
and
that are statistically related (that is, not independent). Suppose,
to begin, that
. Suppose we observe
over the interval
, and based on the information gained we want to estimate
for some fixed
as a function of
. That
is, we form
for some functional mapping the function to real numbers.
If : We say that the operation of the function is smoothing .
If : We way that the operation of the function is filtering .
If : We way that the operation of the function is prediction .
The error in the estimate is . The meansquared error is .
Fact (built on our previous intuition): The MSE
is minimized by the conditional expectation
Furthermore, the orthogonality principle applies: is orthogonal to every function of .
While we know the theoretical result, it is difficult in general to compute the desired conditional expectation.
Note that
may include infinite sequences, so we assume
meansquare limits. The set
contains meansquare derivatives,
meansquare integrals, and other linear transformations of
. (The set
is the Hilbert space generated by the
linear span of
.)
Let's now solve
() 
A couple important properties:

If
then
solves (*) if
and only if
for all
. That
is, the error is orthogonal to all elements of
.

for all
if and only if
and
for all
.
This is a restatement of orthogonality, but for a restricted space.
That is, is the output of a linear filter driven by . Note that . By property 2, we must have (1)
so that
and (2):
for . That is,
This gives us two equations in the unknowns and . We can eliminate :
or
or
The optimal is that which solves this integral equation.
Since we are dealing with covariances, the means have been
eliminated. It is frequently assumed that
and
have zero
means. In this case, the covariances are equal to the correlations,
and we can write
This equation is called the WienerHopf equation.
An integral equation of this form is called a Fredholm equation . The theory on the existence of solutions Fredholm integral equations is wellknown. In practice, solutions are usually numerical.
The solution is sometimes called a Wiener filter .
The filter in this case is called a
NonCausal Wiener Filter
.
It can be shown that the residual error for the noncausal Wiener
filter is
This can be seen as follows:
By orthogonality, the last term is 0, which implies that . We thus obtain
The MMSE is sometimes written as
where