Personal tools
You are here: Home Electrical and Computer Engineering Stochastic Processes Programming Assignments

Programming Assignments

Document Actions
  • Content View
  • Bookmarks
  • CourseFeed

Introduction   ::   System Model   ::   Expectation   ::   Part 1   ::   Filters   ::   Part 2   ::   Submission

Adaptive filters

Solving for the unknown filter coefficients requires setting up the estimated autocorrelation and cross-correlation information, then solving a set of linear equations. These will work provided that the system is stationary, that sufficient data are available to make accurate estimates, and that the process does not take too long. (In real-time circumstances, the computation time might exceed the available computational resources.)

Overcoming these demands can be accomplished in part by the use of adaptive filters. These are filters that adapt themselves to minimize some mean-squared error criterion. Here we provide a very brief introduction to the topic, by introducing what are known as LMS adaptive filters. (LMS stands for least mean squares.)

Let $ u(t)$ be the input to an FIR filter, with output

$\displaystyle x(t) = \sum_{i=0}^L w_i u(t-i).

The coefficients $ w_i$ are the FIR filter coefficients, also referred to as filter weights. This filter operation can be represented as

$\displaystyle x(t) = \wbf^T \ubf(t)


$\displaystyle \wbf =\begin{bmatrix}w_0 \\ w_1 \\ \vdots \\ w_L

is the vector of filter coefficients and

$\displaystyle \ubf(t) = \begin{bmatrix}u(t) \\ u(t-1) \\ \vdots \\ u(t-L)

is the vector of filter inputs/memory.

Suppose that some desired signal $ d(t)$ is available, and we with to filter the input signal $ u(t)$ so that the filter output $ x(t)$ matches the desired signal as closely as possible. That is, we form

$\displaystyle e(t) = x(t) - d(t),

and choose $ \wbf$ to minimize $ e(t)$ . More precisely, we desire to minimize the mean-squared error in $ e(t)$ :

$\displaystyle \min_{\wbf} E[e^2(t)].

Exercise 5
Show that $ E[e^2(t)]$ is minimized when

$\displaystyle R\wbf = \pbf

where $ R = E[\ubf(t) \ubf(t)^T]$ and $ \pbf = E[d(t)\ubf(t)]$ .
By this point, this form of the optimal solution should be familiar.

So far, the filter is not adaptive. We make an adaptive filter as possible. We form an error measure as a function of the filter coefficients:

$\displaystyle J(\wbf) = E[(d(t) - \wbf \ubf(t))^2]

Rather than minimizing all in one step, as before, we compute the gradient of $ J(\wbf)$ with respect to $ \wbf$ , and update the current $ \wbf$ by moving in the direction of the negative gradient. That is, we ``slide downhill'' on the surface of $ J(\wbf)$ . The idea of sliding downhill is conveyed in the following figure:
This figure shows the contours of a function (of two variables). At each point in the plane, the contours are orthogonal to the direction of the gradient, with the gradient pointing in the direction of greatest increase. Thus, starting at an initial point and moving some small distance from the point in the direction of the negative gradient at that point decreases the function value. Then starting at the new point and moving in the direction of the negative gradient at that point again decreases the function value. A series of such steps will eventually reach a point of local minimum. An update rule such as this is referred to as steepest descent .

We denote the gradient of $ J(\wbf)$ with respect to $ \wbf$ as $ \partiald{}{\wbf} J(\wbf)$ . Based on this, an update rule can be written as

$\displaystyle \wbf^{[k+1]} = \wbf^{[k]} - \frac{\mu}{2} \left.\partiald{}{\wbf}
J(\wbf)\right\vert _{\wbf^{[k]}}.

That is, the filter weights at the next time around, $ \wbf^{[k+1]}$ , are obtained by moving from the current weights, $ \wbf^{[k]}$ , in the direction of the negative gradient, evaluated at the current weights, $ -\left.\partiald{}{J(\wbf)}\right\vert _{\wbf^{[k]}}$ . The quantity $ \mu/2$ is a ``step size,'' indicating how far the move should be.

Exercise 6
Show that

$\displaystyle \partiald{}{\wbf} J(\wbf)

can be written as

$\displaystyle \partiald{}{\wbf} J(\wbf) = 2 E[(x(t) - d(t))\ubf(t)] = 2 E[e(t) \ubf(t)].

Hence the weight update rule is

$\displaystyle \wbf^{[k+1]} = \wbf^{[k]} - \mu E[e(t) \ubf(t)].

The last exercise describes how to update the coefficients of an adaptive filter in such a way that the filter $ x(t)$ will become as close as possible to $ d(t)$ (given enough time). Eventually, the solution will converge to the exact MMSE solution.

However, there is a practical problem with the filter to this point. It requires computing $ E[e(t) \ubf(t)]$ . That is, the expected value must be computed, which requires (theoretically) some kind of probability information, or some kind of ensemble to average over. This is problematic in practice. In the LMS filter, the stochastic gradient approximation is used:

Assume that, for every instance (draw) of a random variable, that that instance is equal to the mean value of the random variable.
This seems somewhat reasonable. By the Chebyshev inequality, we would expect that a randomly-drawn value would be close to the mean value. On the other hand, it is not exact: We don't expect any toss of a die (taking outputs in the range $ \{1,2,3,4,5,6\}$ ) to have the value of the mean, which is 3.5!

Under the stochastic gradient approximation, we therefore assume that

$\displaystyle E[e(t) \ubf(t)] \approx e(t) \ubf(t).

While we don't get precise equality at every time step, over a number of runs, or over a number of time steps, we will be right on average. Under this approximation, the gradient descent does not run strictly downhill in the steepest direction, but it does proceed downhill on average .

Under this approximation the filter weight update rule is written

$\displaystyle \wbf^{[k+1]} = \wbf^{[k]} - \mu e(t) \ubf(t).

Putting all the pieces together, we obtain the following algorithm for the LMS adaptive filter:

Starting from some initial filter coefficients $ \wbf^{[0]}$ , and $ k=0$ :

  Given inputs $ u(t)$ and $ d(t)$ :    
  Form $ \ubf(t) = [u(t), u(t-1), \ldots, u(t-L)]^T$ $\displaystyle .$    
$\displaystyle x(t)$ $\displaystyle = \ubf(t)^T \wbf^{[k]}$ (compute filter output)    
$\displaystyle e(t)$ $\displaystyle = x(t) - d(t)$ (compute error between output and desired)    
$\displaystyle \wbf^{[k+1]}$ $\displaystyle = \wbf^{[k]} - \mu e(t) \ubf(t)$ update filter coefficients)    
$\displaystyle k$ $\displaystyle \leftarrow k+1$ (increment iteration number)    

The adaptive filter configuration is often portrayed as in the figure here, where $ W(z)$ represents the adaptive filter,

$\displaystyle W(z) = w_0 + w_1 z^{-1} + \cdots + w_L z^{-L}.

The dashed line feeding back through the filter box is intended to suggest variability, such as in a variable resistor.

Adaptive predictors

The adaptive filter can be used as an adaptive predictor, that is, a predictor which learns the taps that it needs to predict its input signal.

In this configuration, the input to the filter is $ u(t-1)$ . The output of the filter is interpreted as

$\displaystyle x(t) = \uhat(t\vert t-1),

that is, the prediction of $ u(t)$ based on previous information. The desired signal is $ d(t) = u(t)$ : The filter adapts itself until $ d(t)$ is as close as possible to $ u(t)$ , using the information in $ u(t-1), u(t-2), \ldots, u(t-L)$ .

Thus, the predictor in the figure in section 2.2.1 can be replaced with an adaptive filter.

Adaptive identification

The problem of identifying $ G(z)$ to minimize the error in the representation

$\displaystyle e(t) = \ytilde(t) - G(z) \utilde(t)

can also be expressed using an adaptive filter, as shown here:
Copyright 2008, by the Contributing Authors. Cite/attribute Resource . admin. (2006, June 13). Programming Assignments. Retrieved January 07, 2011, from Free Online Course Materials — USU OpenCourseWare Web site: This work is licensed under a Creative Commons License Creative Commons License