Choosing the Contrast Function in Practice

Next: Fixed-point algorithms for ICA Up: Practical choice of contrast Previous: Performance in the exponential

Choosing the Contrast Function in Practice

The theoretical analysis given above gives some guidelines as for the choice of G. In practice, however, there are also other criteria that are important, in particular the following two.

First, we have computational simplicity: The contrast function should be fast to compute. It must be noted that polynomial functions tend to be faster to compute than, say, the hyperbolic tangent. However, non-polynomial contrast functions could be replaced by piecewise linear approximations without losing the benefits of non-polynomial functions.

The second point to consider is the order in which the components are estimated, if one-by-one estimation is used. We can influence this order because the basins of attraction of the maxima of the contrast function have different sizes. Any ordinary method of optimization tends to first find maxima that have large basins of attraction. Of course, it is not possible to determine with certainty this order, but a suitable choice of the contrast function means that independent components with certain distributions tend to be found first. This point is, however, so application-dependent that we cannot say much in general.

Thus, we reach the following general conclusion. We have basically the following choices for the contrast function (for future use, we also give their derivatives):

$\displaystyle G_1(u)=\frac{1}{a_1}\log\cosh (a_1 u),$	$\textstyle g_1(u)=\tanh(a_1 u)$	(14)
$\displaystyle G_2(u)=-\frac{1}{a_2}\exp(-a_2 u^2/2),$	$\textstyle g_2(u)=u\exp(-a_2 u^2/2)$	(15)
$\displaystyle G_3(u)=\frac{1}{4} u^4,$	g₃(u)=u³	(16)

where $1\leq a_1\leq 2 ,a_2\approx 1$ are constants, and piecewise linear approximations of (14) and (15) may also be used. The benefits of the different contrast functions may be summarized as follows:

G₁ is a good general-purpose contrast function.
when the independent components are highly super-Gaussian, or when robustness is very important, G₂ may be better.
if computational overhead must be reduced, piecewise linear approximations of G₁ and G₂ may be used.
using kurtosis, or G₃, is justified on statistical grounds only for estimating sub-Gaussian independent components when there are no outliers.

Finally, we emphasize in contrast to many other ICA methods, our framework provides estimators that work for (practically) any distributions of the independent components and for any choice of the contrast function. The choice of the contrast function is only important if one wants to optimize the performance of the method.

Next: Fixed-point algorithms for ICA Up: Practical choice of contrast Previous: Performance in the exponential

Aapo Hyvarinen
1999-04-23