exercise4

Tik-61.261 Principles of Neural Computing
Raivio, Venna

Exercise 4

Let the error function be

$\displaystyle {\cal E}(\mathbf{w})=w_1^2+10w_2^2,$

where and are the components of the two-dimensional parameter vector . Find the minimum value of by applying the steepest descent method. Use as an initial value for the parameter vector and the following constant values for the learning rate:
1. $\alpha=0.04$
2. $\alpha=0.1$
3. $\alpha=0.2$
4. What is the condition for the convergence of this method?
Show that the application of the Gauss-Newton method to the error function

$\displaystyle {\cal E} (\mathbf{w})= \frac{1}{2} \left[ \delta \Vert \mathbf{w} - \mathbf{w}(n) \Vert^2 + \sum_{i=1}^n e_i^2(\mathbf{w}) \right]$

yields the the following update rule for the weights:

$\displaystyle \Delta \mathbf{w}=-\left[ \mathbf{J}^T(\mathbf{w})\mathbf{J}(\mat... ... \delta \mathbf{I} \right]^{-1} \mathbf{J}^T(\mathbf{w})\mathbf{e}(\mathbf{w}).$

All quantities are evaluated at iteration step . (Haykin 3.3)
The normalized LMS algorithm is described by the following recursion for the weight vector:

$\displaystyle \Hat{\mathbf{w}}(n+1)= \Hat{\mathbf{w}}(n) +\frac{\eta e(n) \mathbf{x}(n)} {\Vert \mathbf{x}(n) \Vert^2},$

where $\eta$ is a positive constant and $\Vert\mathbf{x}(n) \Vert$ is the Euclidean norm of the input vector $\mathbf{x}(n)$ . The error signal is defined by

$\displaystyle e(n)=d(n)-{\Hat{\mathbf{w}}(n)}^T \mathbf{x}(n),$

where is the desired response. For the normalized LMS algorithm to be convergent in the mean square, show that $0< \eta <2$ . (Haykin 3.5)
The ensemble-averaged counterpart to the sum of error squares viewed as a cost function is the mean-square value of the error signal:

$\displaystyle J(\mathbf{w})=\frac{1}{2}E[e^2(n)]=\frac{1}{2}E[(d(n)-\mathbf{x}^T(n)\mathbf{w})^2].$
1. Assuming that the input vector $\mathbf{x}(n)$ and desired response are drawn from a stationary environment, show that
  
  $\displaystyle J(\mathbf{w})=\frac{1}{2}\sigma_d^2-\mathbf{r}_{\mathbf{x}d}^T\mathbf{w} + \frac{1}{2}\mathbf{w}^T\mathbf{R}_{\mathbf{x}}\mathbf{w},$
  
  where $\sigma_d^2=E[d^2(n)]$ , $\mathbf{r}_{\mathbf{x}d}=E[\mathbf{x}(n)d(n) ]$ , and $\mathbf{R}_{\mathbf{x}}=E[\mathbf{x}(n)\mathbf{x}^T(n) ]$ .
2. For this cost function, show that the gradient vector and Hessian matrix of $J(\mathbf{w})$ are as follows, respectively:
  
  $\displaystyle \mathbf{g} =-\mathbf{r}_{\mathbf{x}d} + \mathbf{R}_{\mathbf{x}} \mathbf{w}$ and
  
  $\displaystyle \mathbf{H} =\mathbf{R}_{\mathbf{x}}.$
3. In the LMS/Newton algorithm, the gradient vector $\mathbf{g}$ is replaced by its instantaneous value. Show that this algorithm, incorporating a learning rate parameter $\eta$ , is described by
  
  $\displaystyle \Hat{\mathbf{w}}(n+1)=\Hat{\mathbf{w}}(n)+\eta \mathbf{R}_{\mathbf{x}}^{-1}\mathbf{x}(n)\left[d(n)-\mathbf{x}^T(n)\Hat{\mathbf{w}}(n)\right].$
  
  The inverse of the correlation matrix $\mathbf{R}_{\mathbf{x}}$ , assumed to be positive definite, is calculated ahead of time. (Haykin 3.8)
A linear classifier separates -dimensional space into two classes using a -dimensional hyperplane. Points are classified into two classes, or , depending on which side of the hyperplane they are located.
1. Construct a linear classifier which is able to separate the following two-dimensional samples correctly:
  
  $\displaystyle \omega_1:$ $\displaystyle \{[2,1]^T\},$
  
  $\displaystyle \omega_2:$ $\displaystyle \{[0,1]^T, [-1,1]^T \}.$
2. Is it possible to construct a linear classifier which is able to separate the following samples correctly?
  
  $\displaystyle \omega_1:$ $\displaystyle \{ [2,1]^T, [3,2]^T \},$
  
  $\displaystyle \omega_2:$ $\displaystyle \{ [3,1]^T,[2,2]^T \}$
  
  Justify your answer.