T-61.5020 Statistical Natural Language Processing
Answers 5 -- Markov chains and Hidden Markov Models
Version 1.0

1.
a)
The Markov chain is drawn in figure 1.
Figure 1: Weather as a Markov chain.
\begin{figure}\centering\epsfig{file=Turku-Markov.eps,width=0.4\linewidth}
\end{figure}
b)
Let's calculate the probability for state sequence $ \mathbf S = (S_3, S_2, S_1, S_1, S_1)$ when we know that we start from state $ S_2$:
$\displaystyle P(\mathbf S ~\vert~ q_0=S_2)$ $\displaystyle =$ $\displaystyle P(q_1=S_3, q_2=S_2, q_3=S_1, q_4=S_1, q_5=S_1~\vert~ q_0=S_2)$  
  $\displaystyle =$ $\displaystyle P(q_1=S_3 ~\vert~ q_0=S_2) \cdot P(q_2=S_2 ~\vert~ q_0=S_2, q_1=S_3)$  
    $\displaystyle \cdot P(q_3=S_1 ~\vert~ q_0=S_2, q_1=S_3, q_2=S_2)$  
    $\displaystyle \cdot P(q_4=S_1 ~\vert~ q_0=S_2, q_1=S_3, q_2=S_2, q_3=S_1)$  
    $\displaystyle \cdot P(q_5=S_1 ~\vert~ q_0=S_2, q_1=S_3, q_2=S_2, q_3=S_1, q_4=S_1)$  

The probability in the first row is split several times using the formula $ P(A,B\vert C)=P(A\vert C)\cdot P(B\vert A, C)$.

Let's apply the Markov assumption:

$\displaystyle P(\mathbf S ~\vert~ q_0=S_2)$ $\displaystyle =$ $\displaystyle P(q_1=S_3 ~\vert~ q_0=S_2) \cdot P(q_2=S_2 ~\vert~ q_1=S_3)$  
    $\displaystyle \cdot P(q_3=S_1 ~\vert~ q_2=S_2) \cdot P(q_4=S_1 ~\vert~ q_3=S_1)$  
    $\displaystyle \cdot P(q_5=S_1 ~\vert~ q_4=S_1)$  

This corresponds to the coefficients $ a_{ij}$:
$\displaystyle P(\mathbf S ~\vert~ q_0=S_2)$ $\displaystyle =$ $\displaystyle a_{23} \cdot a_{32} \cdot a_{21} \cdot
a_{11} \cdot a_{11}$  
  $\displaystyle =$ $\displaystyle 0.1 \cdot 0.3 \cdot 0.4 \cdot 0.8 \cdot 0.8$  
  $\displaystyle =$ $\displaystyle 0.0077$  

c)
The expectation value for how long we stay on a single state $ S_i$ is
$\displaystyle E(x)$ $\displaystyle =$ $\displaystyle \int x P(x) dx$  
  $\displaystyle =$ $\displaystyle \sum_{n=1}^\infty n a_{ii}^n (1-a_{ii})$  
  $\displaystyle =$ $\displaystyle (1-a_{ii}) \frac{a_{ii}}{(1-a_{ii})^2}$  
  $\displaystyle =$ $\displaystyle \frac{a_{ii}}{1-a_{ii}}$  

In the case of sunny days $ a_{11}=0.8$ so we get $ \frac{0.8}{0.2} = 4$ days. This is the expectation for staying on the state, so the answer for the number of successive sunny days on avarage is $ 4+1=5$.

2.
a) In this problem we want to calculate the probability of state $ S_i$ in the third day, when we know that it was sunny on the day of the departure and the temperatures on the following three days were $ \mathbf X = ( x_1, x_2, x_3) = (7^\circ C, 3^\circ C, -8^\circ C)$. Let's denote all the model parameters as $ \lambda=\{ \mathbf A , b_i(x) \}$.
    $\displaystyle P (q_3=S_i ~\vert~ q_0=S_1, x_1, x_2, x_3, \lambda )$  
  $\displaystyle =$ $\displaystyle \frac{P (q_3=S_i, x_1, x_2, x_3 ~\vert~ q_0=S_1, \lambda)}
{P (x_1, x_2, x_3 ~\vert~ q_0=S_1, \lambda)}$  
  $\displaystyle =$ $\displaystyle \frac{P (q_3=S_i, x_1, x_2, x_3 ~\vert~ q_0=S_1, \lambda)}
{\sum_{j=1}^3 P (q_3=S_j,x_1, x_2, x_3 ~\vert~ q_0=S_1, \lambda)}$  

Above we used first the formula $ P(A\vert B,C)=\frac{P(A,B\vert C)}{P(B\vert C)}$. Then we noted that $ P(A)=\sum_B P(A,B)$. Denominator and numerator of the equation have similar terms. Let's lighten the notes using the function $ \alpha_t(i)$:


$\displaystyle P (q_3=S_i ~\vert~ q_0=S_1, x_1, x_2, x_3, \lambda )$ $\displaystyle =$ $\displaystyle \frac{\alpha_3(i)}{\sum_{j=1}^3 \alpha_3(j)}$  

Now we will examine how this forward probability $ \alpha_t(i)$ can be calculated.

Initial day

$\textstyle \parbox{.65\linewidth}{
We know that it was sunny on the day of the ...
...ray*}
\alpha_0(1)&=&1\\
\alpha_0(2)&=&0\\
\alpha_0(3)&=&0
\end{eqnarray*}}$ $\textstyle \parbox{.3\linewidth}{
\epsfig{file=forward1.eps,clip=,}
\par
\textit{Initialization of the grid for the forward algorithm.}
}$

First day

$\textstyle \parbox{.65\linewidth}{
It was sunny on the day before and today it ...
...)&=&a_{13} \cdot b_3(x\ge5^\circ C) = 0.05\cdot 0.3 = 0.015\\
\end{eqnarray*}}$ $\textstyle \parbox{.3\linewidth}{
\epsfig{file=forward2.eps,clip=,}
\par
\textit{Grid after the first day}
}$ Second day

$\textstyle \parbox{.65\linewidth}{
Now we do not know the real weather of the p...
...! 0.4 \!
\cdot \! 0.015)\cdot 0.4 \\
&=& 6.000\cdot 10^{-4}
\end{eqnarray*}}$ $\textstyle \parbox{.3\linewidth}{
\epsfig{file=forward3.eps,clip=,}
\textit{\mbox{Grid} after the second day}
}$

Third day

$\textstyle \parbox{.65\linewidth}{
Let's continue as before, now $x_3< -5^\circ...
...dot 6.000\cdot 10^{-4} )\cdot 0.3 \\
&=& 9.4387\cdot 10^{-4}
\end{eqnarray*}}$ $\textstyle \parbox{.3\linewidth}{
\epsfig{file=forward4.eps,clip=}
\par
\textit{The forward grid after the last day}
}$

Now we have calculated all the neccessary quantities. Let's insert them to the equation:

$\displaystyle P (q_3=S_1 ~\vert~ q_0=S_1, x_1, x_2, x_3, \lambda )$ $\displaystyle =$ $\displaystyle \frac{\alpha_3(1)}{\sum_{i=j}^3 \alpha_3(j)}$  
  $\displaystyle =$ $\displaystyle \frac{1.2144 \cdot 10^{-2}}{1.2144\cdot 10^{-2} + 1.4149\cdot 10^{-3}
+ 9.4387\cdot 10^{-4} }$  
  $\displaystyle =$ $\displaystyle 0.8874$  

So the probability that it is sunny on the day of the return is 89 %. In a similar manner we can calculate the probability of a cloudy (10 %) and a rainy (1 %) day.

b) This time we try to guess the most probable sequence of weather. In can be done using the Viterbi algorithm, which is very similar to the forward algorithm. The difference is that in each state we now calculate the probability over the best sequence instead of summing up all the sequences. Viterbi: Initial day

$\textstyle \parbox{.65\linewidth}{
Let's initializate the grid as before.
\begi...
...ray*}
\delta_0(1)&=&1\\
\delta_0(2)&=&0\\
\delta_0(3)&=&0
\end{eqnarray*}}$ $\textstyle \parbox{.3\linewidth}{
\epsfig{file=viterbi1.eps,clip=,}
\par
\textit{Initialization of the grid for the Viterbi algorithm.}
}$

Viterbi: First day

$\textstyle \parbox{.65\linewidth}{
The best path to every state comes from sunn...
...0.015\\
\psi_1(1)&=&1\\
\psi_1(2)&=&1\\
\psi_1(3)&=&1\\
\end{eqnarray*}}$ $\textstyle \parbox{.3\linewidth}{
\epsfig{file=viterbi2.eps,clip=,}
\par
\textit{Viterbi search after the first day}
}$Viterbi: Second day

$\textstyle \parbox{.65\linewidth}{
Let's choose the most probable path coming t...
...j1} b_1(-5^\circ C
\!\le\! x \!\le\! 5^\circ C)) \\ &=& 1 \\
\end{eqnarray*}}$

$\textstyle \parbox{.65\linewidth}{
\begin{eqnarray*}
\delta_2(2)&=& \max_j ~ (...
...a_{j3} b_2(-5^\circ C
\!\le\! x \!\le\! 5^\circ C)) \\ &=& 3
\end{eqnarray*}}$ $\textstyle \parbox{.3\linewidth}{
\epsfig{file=viterbi3.eps,clip=,}
\textit{\mbox{Grid} after second day}
}$When counting $ \psi_2(3)$ we note that the probabilities were same from the states 1 and 3. We can make an arbitrary choice between them. Here the choice was 3.



Viterbi: Third day

$\textstyle \parbox{.65\linewidth}{
Let's choose the most probable path coming t...
...ax}_j ( \delta_2(j)\cdot a_{j1} b_1(x \le 5^\circ
C)) = 2 \\
\end{eqnarray*}}$

$\textstyle \parbox{.65\linewidth}{
\begin{eqnarray*}
\delta_3(2)&=& \max_j ~ (...
...j ( \delta_1(j)\cdot a_{j3} b_2(x \!\le\! -5^\circ C)) = 2 \\
\end{eqnarray*}}$ $\textstyle \parbox{.3\linewidth}{
\epsfig{file=viterbi4.eps,clip=,}
\par
\textit{Viterbi search after the last day}
}$From the final grid we can get the most probable sequence of states: Let's start from the most probable end state and follow the arrows backwards to the beginning. It seems that it has been sunny, cloudy and again sunny.

Conclusion: The differece between forward and Viterbi algorithms

The forward algorithm gives the correct probability for each state sequence. However, it cannot be used to get the most probable path from the grid.

In Viterbi search, the state probabilities are just approximations. However, it can correctly find the best path.

Computationally both of the algorithms are equally burdensome. Summing in the forward algorithm has just changed to maximization in the Viterbi algorithm.



svirpioj[a]cis.hut.fi