# Application of Information Theory to Blind Source Separation

Introduction :: BSS :: Mackay's Approach :: Natural Gradient :: p(u)

## Introduction

The principles of information theory can be applied to the blind source separation problem. We will briefly state the problem, then develop steps toward its solution.

## Background and some preliminary results

We consider first the case of adapting a processing function
*
g
*
which operates on a a scalar
*
X
*
using a function
*
Y
*
=
*
g
*
(
*
X
*
) in order to maximize the mutual information between
*
X
*
and
*
Y
*
. That is, we assume that
*
g
*
(
*
X
*
) =
*
g
*
(
*
X
*
;
*
w
*
,
*
w
*
_{
0
}
) for some parameters
*
w
*
and
*
w
*
_{
0
}
, which are to be chosen to maximize
*
I
*
(
*
X
*
;
*
Y
*
). We assume that
*
g
*
is a deterministic function. We have

*I*(

*X*;

*Y*) =

*H*(

*Y*) -

*H*(

*Y*|X).

But since
*
g
*
is deterministic,
*
H
*
(
*
Y
*
|X) =
*
H
*
(
*
g
*
(
*
X
*
)|X) = 0, so the mutual information is maximized when
*
H
*
(
*
Y
*
) is maximized. (Actually, if we are dealing with differential entropy, this may not be the case. But we will take derivatives, and in any event
*
H
*
(
*
Y
*
|X) is constant.) Now, assuming the range of
*
g
*
is restricted (a reasonable assumption), what form should
*
g
*
be ideally? (the CDF of
*
X
*
). Draw a picture. Recall that

If
*
g
*
(
*
x
*
) =
*
F
*
_{
X
}
(
*
x
*
), then
*
dy
*
/dx =
*
f
*
_{
x
}
(
*
x
*
), and we get
*
f
*
_{
Y
}
(
*
y
*
) = 1 (fill in some details). Under the rule for transformations,

But
*
f
*
_{
x
}
(
*
x
*
) does not depend on our parameters, so we can ignore it. Of course, we may not know the pdf of
*
X
*
, and may not have the flexibility to choose. However, what is frequently done is to assume a particular functional form, and just fill in the parameters. Take

The effect of this learning rule is to drive
*
Y
*
to be as uniform as possible, then the form of
*
g
*
. We can generalize this to
*
N
*
inputs and
*
N
*
outputs. Suppose we take

*I*(

*X*;

*Y*) =

*H*(

*Y*) -

*H*(

*Y*|X) =

*H*(

*Y*).

We want to determine

*W*and to maximize the joint entropy of the output, .

*W*is a matrix, is a vector. We have the pdf transformation equation

*J*is the Jacobian of the transformation,

Then, we before, we find

where the second term does not depend upon the parameters. Then

As explored in the homework,

(1) |

and similarly,