- In section 4.6 (part 5, Haykin pp. 181) it is mentioned that the
inputs should be normalized to accelerate the convergence of the
back-propagation learning process by preprocessing them as follows: 1)
their mean should be close to zero, 2) the input variables should be
uncorrelated, and 3) the covariances of the decorrelated inputs should
be approximately equal.
- Devise a method based on principal component analysis performing these steps.
- Is the proposed method unique?

- A continuous function can be approximated with
a step function in the closed interval
as
illustrated in Figure 1.
- Show how a single column, that is of height in the interval and zero elsewhere, can be constructed with a two-layer MLP. Use two hidden units and the sign function as the activation function. The activation function of the output unit is taken to be linear.
- Design a two-layer MLP consisting of such simple sub-networks which approximates function with a precision determined by the width and the number of the columns.
- How does the approximation change if tanh is used instead of sign as an activation function in the hidden layer?

- A MLP is used for a classification task. The number of classes
is and the classes are denoted with
. Both the input vector
and the corresponding
class are random variables, and they are assumed to have a joint probability distribution
. Assume that we have so many training
samples that the back-propagation algorithm minimizes the following
expectation value:

where is the actual response of the th output neuron and is the desired response.- Show
that the theoretical solution of the minimization problem is

- Show that if when
belongs to class and
otherwise, the theoretical solution can be written

which is the optimal solution in a Bayesian sense. - Sometimes the number of the output neurons is chosen to be less than the number of classes. The classes can be then coded with a binary code. For example in the case of 8 classes and 3 output neurons, the desired output for class is , for class it is and so on. What is the theoretical solution in such a case?

- Show
that the theoretical solution of the minimization problem is

Jarkko Venna 2005-04-13