Introduction :: BSS :: Mackay's Approach :: Natural Gradient :: p(u)
The principles of information theory can be applied to the blind source separation problem. We will briefly state the problem, then develop steps toward its solution.
We consider first the case of adapting a processing function g which operates on a a scalar X using a function Y = g(X) in order to maximize the mutual information between X and Y. That is, we assume that g(X) = g(X; w,w0) for some parameters w and w0, which are to be chosen to maximize I(X;Y). We assume that g is a deterministic function. We have
But since g is deterministic, H(Y|X) = H(g(X)|X) = 0, so the mutual information is maximized when H(Y) is maximized. (Actually, if we are dealing with differential entropy, this may not be the case. But we will take derivatives, and in any event H(Y|X) is constant.) Now, assuming the range of g is restricted (a reasonable assumption), what form should g be ideally? (the CDF of X). Draw a picture. Recall that
If g(x) = FX(x), then dy/dx = fx(x), and we get fY(y) = 1 (fill in some details). Under the rule for transformations,
But fx(x) does not depend on our parameters, so we can ignore it. Of course, we may not know the pdf of X, and may not have the flexibility to choose. However, what is frequently done is to assume a particular functional form, and just fill in the parameters. Take
The effect of this learning rule is to drive Y to be as uniform as possible, then the form of g. We can generalize this to N inputs and N outputs. Suppose we take
Then, we before, we find
where the second term does not depend upon the parameters. Then
As explored in the homework,
| (1) |
and similarly,