Personal tools
•
You are here: Home Application of Information Theory to Blind Source Separation

Application of Information Theory to Blind Source Separation

Document Actions

Introduction   ::   BSS   ::   Mackay's Approach   ::   Natural Gradient   ::   p(u)

The training law we have developed up to this point requires computation of W - T . We can modify this by

This becomes (since )

This modification to the gradient, multiplying by W T W is called the natural gradient (Amari, 1998). In this section, we examine this, with any eye to the question: what is natural about it? Comment on scaling of update formula. We follow Amari 1998 in the following discussion. Suppose is some parameter space (e.g., the space of parameters in the weighting matrix. Suppose there is some function defined. Consider a parameter value , and some incremental change to . If the parameter space is Euclidean , then the length of the increment is

However, not all parameter spaces are Euclidean. Consider, for example, a case where the parameters all lie on a sphere. Then the appropriate distance measure is not simply the sum of the squares of the coordinates, especially if is measured in spherical coordinates! So we measure the change differently:

Here, g is called the Riemannian metric tensor ; it describes the local curvature of the parameter space at the point . In terms of vectors, we can write

where (a function of ). G is symmetric. We see that we are simply dealing with a weighted distance, induced from a weighted inner product, defined by

When , we simply get the Euclidean distance. Now consider the problem of learning by "steepest descent." The question is, do we really go in the right direction, if we take into account the curvature of the parameter space. We want to decrease by moving in a direction to obtain , and do the best possible job with the motion. Let us assume that we have a fixed step length,

for some small positive .

Observe that the usual steepest descent'' that we deal with always assumes that G = I .

We call

the natural gradient of L in the Riemannian space. In Euclidean space, it is the same as the usual gradient. Now consider the BSS problem in the context of natural gradient. We first formulate the problem. We have, as before, signal vectors with independent components, so that

and . The output is

and we update the matrix by some learning rule

Previously, we took the learning update to be , but this will now change. We observe that in order to obtain equilibrium, the function F must satisfy
 (2)

when W = A -1 (we stop changing at the correct answer). Now let be an operator that maps a matrix to a matrix, and let

Then satisfies ( 2 ) when F does (same equilibrium). We want to determine what form the transformation should take. Let dW be a small deviation from a matrix W to W + dW . dW constitutes a "vector'' starting from the point W . Let us define an inner product at W as

(Draw a picture of a curved W surface, and the vector on it.) We can pull back the point, mapping to another surface, by right-multiplying by W -1 . Then W maps to I , and W + dW maps to

I + dX

where

dX = dW W -1 .

A deviation dW at W is equivalent to the deviation dX at I by this mapping. The key idea is that we want the metric to be invariant under this mapping: the inner product of dW at W is to be the same as the inner product of dWY at WY for any Y . Thus we impose the invariant

In particular, when Y = W -1 , we have WY = I . We define the inner product at I by

the (unweighted, Euclidean) Frobenius norm. Under our principle of equivalence (using dX = dWW -1 ), we should therefore have

It follows that the Riemannian tensor has the form

We can determine an explicit form for the natural gradient using the principle of invariance. We interpret as a vector applied at W , and as a vector applied at I . Then we must have

We thus have (using the definition of the inner product)

Using the commuting properties of trace we find

Since this must be true for arbitrary dW , we must have

or

Copyright 2008, by the Contributing Authors. Cite/attribute Resource . admin. (2006, May 17). Application of Information Theory to Blind Source Separation. Retrieved January 07, 2011, from Free Online Course Materials — USU OpenCourseWare Web site: http://ocw.usu.edu/Electrical_and_Computer_Engineering/Information_Theory/lecture4_4.htm. This work is licensed under a Creative Commons License