sol9_2003

T-61.263 Advanced course in neural computing

Solutions for exercise 9

1. We start with the notion that a neuron flips from state to at temperature with probability
  
  $\displaystyle P(x_j \rightarrow -x_j) = \frac{1}{1+\exp(-\Delta E_j/T)}$ (1)
  
  where $\Delta E_j$ is the energy difference resulting from such a flip. The energy function of the Boltzmann machine is defined by
  
  $\displaystyle E = -\frac{1}{2} {\sum_i \sum_j}_{i \neq j} w_{ji} x_i x_j \;.$
  Hence the energy change produced by neuron flipping from state to is
  
  $\begin{multline} \Delta E_j = (\text{energy with neuron $j$\ in state $x_j$}) -... ... \sum_i w_{ji} x_i \\ = -2 x_j \sum_i w_{ji} x_i = -2 x_j v_j \end{multline}$
  
  where is the induced local field of neuron .
2. In light of the result in Eq.(2), we may rewrite Eq.(1) as
  
  $\displaystyle P(x_j \rightarrow -x_j) = \frac{1}{1 + \exp(2x_j v_j/T)} \;.$
  This means that for an initial state , the probability that neuron is flipped into state is
  
  $\displaystyle \frac{1}{1+\exp(-2v_j/T)} \;.$ (2)
3. For an initial state of , the probability that neuron is flipped into state is
  
  $\displaystyle \frac{1}{1+\exp(+2v_j/T)} = 1-\frac{1}{1+\exp(-2v_j/T)} \;.$ (3)
  
  The flipping probability in Eq.(4) and the one in Eq.(3) are in perfect agreement with the following probabilistic rule:
  
  $\displaystyle x_j = \left\{ \begin{array}{l} +1 \; \text{with probability}\; P(v_j) \\ -1 \; \text{with probability}\; 1-P(v_j) \\ \end{array} \right.$
  where is itself defined by
  
  $\displaystyle P(v_j) = \frac{1}{1+\exp(-2v_j/T)} \;.$
The Boltzmann machine and sigmoid belief network share a common feature: they are both stochastic machines with their theory rooted in statistical mechanics.
They differ from each other in the following respects:
- The Boltzmann machine is a recurrent network whereas the sigmoid belief network is an acyclic feedforward network.
- The learning process in a Boltzmann machine involves two phases: one clamped (positive) and the other free running (negative). The negative phase is eliminated from the sigmoid belief network.
Writing the system of simultaneous equations (Haykin, Eq. 12.22) in matrix form:

$\displaystyle \mathbf{J}^{\mu} = \mathbf{C}(\mu) + \gamma \mathbf{P}(\mu) \mathbf{J}^{\mu}$ (4)

where

$\displaystyle \mathbf{J}^{\mu} = [J^{\mu}(1), J^{\mu}(2), \ldots, J^{\mu}(N)]^T$

$\displaystyle \mathbf{c}(\mu) = [c(1,\mu), c(2,\mu), \ldots, c(N,\mu)]^T$

$\displaystyle \mathbf{P}(\mu) = \left[ \begin{array}{cccc} p_{11}(\mu) & p_{12}... ... p_{N1}(\mu) & p_{N2}(\mu) & \ldots & p_{NN}(\mu) \\ \end{array} \right] \;.$
Rearranging terms in Eq.(5):

$\displaystyle (\mathbf{I} - \gamma \mathbf{P}(\mu)) \mathbf{J}^{\mu} = \mathbf{c}(\mu)$
where $\mathbf{I}$ is the -by- identity matrix. For the solution $\mathbf{J}^{\mu}$ to be unique we require that the -by- matrix $(\mathbf{I}-\gamma\mathbf{P}(\mu))$ have an inverse matrix for all possible values of the discount factor $\gamma$ .
An important property of dynamic programming is the monotonicity property described by

$\displaystyle J^{\mu_{n+1}} \le J^{\mu_n} \;.$
This property follows from the fact that if the terminal cost for stages is changed to a uniformly larger cost $\bar g_k$ , that is,

$\displaystyle \bar g_k(x_k) \ge g_k(x_k) \;$ for all $\displaystyle \; x_k$
then the last stage cost-to-go function $J_{K-1}(x_{K-1})$ will be uniformly increased. In more general terms, we may state the following. Given two cost-to-go functions $J_{K+1}$ and $\bar J_{K+1}$ with

$\displaystyle \bar J_{K+1}(x_{K+1}) \ge J_{K+1}(x_{K+1})\;$ for all $\displaystyle \; x_{K+1}$
we find that for all and $\mu_K$ the following relation holds:

$\displaystyle E[ g_K(x_K,\mu_K,x_{K+1}) + J_{K+1}(x_{K+1})] \le E[g_K(x_K,\mu_K,x_{K+1}) + \bar J_{K+1}(x_{K+1})] \;.$
This relation merely restates the monotonicity property of the dynamic programming algorithm.

Jaakko Peltonen
2003-11-21