Personal tools
You are here: Home Electrical and Computer Engineering Stochastic Processes Linear Minimum Mean-Square Error Filtering

Linear Minimum Mean-Square Error Filtering

Document Actions
  • Content View
  • Bookmarks
  • CourseFeed

Background  ::  Filtering


Recall that for random variable $X$ and $Y$ with finite variance, the MSE $E[(X - h(Y))^2]$ is minimized by $h(Y) = E[X\vert Y]$ . That is, the best estimate of $X$ using a measured value of $Y$ is to find the conditional average of $X$ . One aspect of this estimate is that:

The error is orthogonal to the data.
More precisely, the error $X - E[X\vert Y]$ is orthogonal to $Y$ and to every function of $Y$ :

\begin{displaymath}E[(X - E[X\vert Y])g(Y)] = 0

for all measurable functions $g$ . We will assume that $E[g^2(Y)] <
\infty$ .

We want to show that $h$ minimizes $E[(X - h(Y))^2]$ if and only if $E[(X - h(Y))g(Y)] = 0$ (orthogonality), for all measurable $g$ such that $E[g^2(Y)] <
\infty$ .

E[(X-E[X\vert Y])g(Y)] &= E[E[(X-E[X\vert Y])...
...Y]g(Y)] = E[(E[X\vert Y] - E[X\vert Y])g(Y)]
= 0.

Conversely, suppose for some $g$ , $E[(X - h(Y))g(Y)] \neq 0$ . Consider the estimate

\begin{displaymath}\hhat(Y) = h(Y) + \alpha g(Y),


\begin{displaymath}\alpha = \frac{E[(X-h(Y))g(Y)]}{E[g^2(Y)]}.


\begin{displaymath}E[(X - \hhat(Y))^2] = E[(X - h(Y))^2] -
\frac{(E[(X-h(Y))g(Y)])^2}{E[g^2(Y)]} < E[(X-h(Y))^2].

Suppose now we are given two random processes $\{X_t\}$ and $\{Y_t\}$ that are statistically related (that is, not independent). Suppose, to begin, that $T=\Rbb$ . Suppose we observe $Y$ over the interval $[a,b]$ , and based on the information gained we want to estimate $X_t$ for some fixed $t$ as a function of $\{Y_t, a \leq t \leq b\}$ . That is, we form

\begin{displaymath}\Xhat_t = f(\{Y_\tau, a \leq \tau \leq b\})

for some functional $f$ mapping the function to real numbers.

If $t < b$ : We say that the operation of the function is smoothing .

If $t = b$ : We way that the operation of the function is filtering .

If $t > b$ : We way that the operation of the function is prediction .

The error in the estimate is $X_t - \Xhat_t$ . The mean-squared error is $E[(X_t - \Xhat_t)^2]$ .

Fact (built on our previous intuition): The MSE $E[(X_t - \Xhat_t)^2]$ is minimized by the conditional expectation

\begin{displaymath}\Xhat(t)= E[X_t\vert Y_\tau, a \leq \tau \leq b].

Furthermore, the orthogonality principle applies: $X_t - E[X_t \vert Y_\tau, a \leq \tau \leq b]$ is orthogonal to every function of $\{Y_\tau, a \leq \tau \leq b\}$ .

While we know the theoretical result, it is difficult in general to compute the desired conditional expectation.

Suppose $\{Y_t\}$ is second order. Let $\Hc_y$ be the set ...
... for $n
\in \Zbb$ and $a_i, c \in \Rbb$ and $t_i \in [a,b]$.
Note that $\Hc_y$ may include infinite sequences, so we assume mean-square limits. The set $\Hc_y$ contains mean-square derivatives, mean-square integrals, and other linear transformations of $\{Y_t, t
\in [a,b]\}$ . (The set $\Hc_y$ is the Hilbert space generated by the linear span of $Y_t$ .)

Let's now solve

\min_{\Xhat_t \in \Hc_y} E[(X_t - \Xhat_t)^2].
\end{displaymath} ()

A couple important properties:
  • If $E[X_t^2] < \infty$ then $\Xhat_t \in \Hc_y$ solves (*) if and only if $E[(X_t - \Xhat_t)Z] = 0$ for all $Z \in \Hc_y$ . That is, the error is orthogonal to all elements of $\Hc_y$ .

\lq\lq If'': Suppose $\Xhat_t \in \Hc_y$\ satisfies $E[(X_t - \Xhat_t)...
...imator, which implies the necessity
of the orthogonality condition.

  • $E[(X_t - \Xhat_t)Z] = 0$ for all $Z \in \Hc_y$ if and only if $E[\Xhat_t] = E[X_t]$ and $E[(X_t - \Xhat_t)Y_\tau] = 0$ for all $\tau \in [a,b]$ .

    This is a restatement of orthogonality, but for a restricted space.

\lq\lq Only if'' (necessity): Want to show that $E[(X_t -\Xhat_t)Z] = 0...
...(X_t - \Xhat_t)Y_{t_i}] + c E[(X_t - \Xhat_t)] = 0.

Suppose we further restrict $\Xhat_t$ to be of the form

\begin{displaymath}\Xhat_t = \int_a^b h(t,\tau) Y_\tau d\tau + c_t.

That is, $\Xhat_t$ is the output of a linear filter driven by $Y_t$ . Note that $\Xhat_t \in \Hc_y$ . By property 2, we must have (1)

\begin{displaymath}E[X_t] = E[\Xhat_t] = \int_a^b h(t,\tau) \mu_Y(\tau)d\tau + c_t

so that

\begin{displaymath}c_t = \mu_x(t) - \int_a^b h(t,\tau) \mu_y(\tau)d\tau

and (2):

\begin{displaymath}E[X_t Y_\tau] = E[\Xhat_t Y_\tau]

for $\tau \in [a,b]$ . That is,

\begin{displaymath}R_{XY}(t,\tau) = \int_a^b h(t,\sigma) R_Y(\sigma,\tau) d\sigma +
c_t \mu_y(\tau)

This gives us two equations in the unknowns $c_t$ and $h$ . We can eliminate $c_t$ :

\begin{displaymath}R_{XY}(t,\tau) = \int_a^b h(t,\sigma)(R_Y(\sigma,\tau) -
\mu_y(\tau)\mu_y(\sigma))d\sigma + \mu_x(t) \mu_x(\tau)


\begin{displaymath}(R_{XY}(t,\tau) - \mu_x(t) \mu_y(\tau)) =
\int_a^b h(t,\sigma) C_Y(\sigma,\tau)\,d\sigma


\begin{displaymath}C_{XY}(t,\tau) = \int_a^b h(t,\sigma)C_Y(\sigma,\tau) d\sigma,
\qquad \tau \in [a,b].

The optimal $h$ is that which solves this integral equation.

Since we are dealing with covariances, the means have been eliminated. It is frequently assumed that $X_t$ and $Y_t$ have zero means. In this case, the covariances are equal to the correlations, and we can write

\begin{displaymath}\boxed{R_{XY}(t,\tau) = \int_a^b h(t,\sigma)
R_Y(\sigma,\tau) d\sigma.}

This equation is called the Wiener-Hopf equation.

An integral equation of this form is called a Fredholm equation . The theory on the existence of solutions Fredholm integral equations is well-known. In practice, solutions are usually numerical.

The solution $h$ is sometimes called a Wiener filter .

A Non-Causal Wiener filter. Suppose $a = -\infty$ and $b =
...0(\omega) = \frac{S_{XY}(\omega)}{S_Y(\omega)}.}
\end{displaymath} \end{example}
The filter in this case is called a Non-Causal Wiener Filter .
\begin{displaymath}Y_t = S_t + N_t
..._S/S_N \gg 1 \\
0 & S_S/S_N \ll 1
\end{displaymath} \end{example}
It can be shown that the residual error for the noncausal Wiener filter is

\begin{displaymath}MMSE =\frac{1}{2\pi} \int_{-\infty}^\infty [S_X(\omega) -
\frac{\vert S_{XY}(\omega)\vert^2}{S_Y(\omega)}]d\omega.

This can be seen as follows:

\begin{displaymath}E[(X_t - \Xhat_t)^2] = E[(X_t - \Xhat_t)X_t] - E[(X_t -

By orthogonality, the last term is 0, which implies that $E[X_t\Xhat_t] = E[\Xhat_t^2]$ . We thus obtain

\begin{displaymath}E[(X_t - \Xhat_t)^2] = E[X_t^2] - E[\Xhat_tX_t] = E[X_t^2] -
...t_{-\infty}^\infty [S_X(\omega) -
S_{\Xhat}(\omega)]  d\omega

The MMSE is sometimes written as

\begin{displaymath}MMSE = \frac{1}{2\pi} \int_{-\infty}^\infty
S_X(\omega)[1-\vert\rho_{XY}(\omega)\vert^2]  d\omega


\begin{displaymath}\rho_{XY}(\omega) = \frac{S_{XY}(\omega)}{\sqrt{S_X(\omega)

For the signal $+$ noise problem, we have
...S_N(\omega)}{S_S(\omega) + S_N(\omega)} d\omega.

Let us now do the signal $+$ noise problem for a particular si...
...This is not a causal filter. Plot for various values of $\lambda$.
Copyright 2008, Todd Moon. Cite/attribute Resource . admin. (2006, June 07). Linear Minimum Mean-Square Error Filtering. Retrieved January 07, 2011, from Free Online Course Materials — USU OpenCourseWare Web site: This work is licensed under a Creative Commons License Creative Commons License