T-61.5020 Statistical Natural Language Processing
Exercises 2 -- Entropy and perplexity
Version 1.0
| W | P(W) | |
| 'kissa' | (cat) |
|
| 'tuuli' | (wind) |
|
| 'kiipeilijä' | (climber) |
|
| 'naukaisi' | (meowed) |
|
| 'tuivertaa' | (blows) |
|
| 'katosi' | (disappeared) |
|
| 'naukaisi' | 'tuivertaa' | 'katosi' | ||
| 'kissa' |
|
0 |
|
|
| 'tuuli' |
|
|
|
|
| 'kiipeilijä' |
|
0 |
|
|
|
|
|
|
| Model 1 | Model 2 |
| P(sana='kissa')=0.1 | P(word=subject)=0.33 |
| P(sana='koira')=0.1 | P(word=verb)=0.33 |
| P(sana='valas')=0.1 | P(word=object)=0.33 |
| P(sana='kala')=0.1 | |
| P(sana='istui')=0.1 | |
| P(sana='menee')=0.1 | |
| P(sana='on')=0.1 | |
| P(sana='puuhun')=0.1 | |
| P(sana='kuuhun')=0.1 | |
| P(sana='suuhun')=0.1 |
| Model 3 | |
| P(sana='kissa' | word=first) | =0.25 |
| P(sana='koira' | word=first) | =0.25 |
| P(sana='valas' | word=first) | =0.25 |
| P(sana='kala' | word=first) | =0.25 |
| P(sana='istui' | previous_word |
=0.33 |
| P(sana='menee' | previous_word |
=0.33 |
| P(sana='on' | previous_word |
=0.33 |
| P(sana='puuhun' | previous_word |
=0.33 |
| P(sana='kuuhun' | previous_word |
=0.33 |
| P(sana='suuhun' | previous_word |
=0.33 |
Perplexity can be defined as the inverse of the geometric mean of the probabilities: