T-61.184 Statistical (Adaptive) Language Modeling
Special course on Information Technology
IV (4 ov, L)
Lecturers: D.Sc.(Tech) Krista Lagus
and Prof. Mikko Kurimo
Semester: Autumn 2001
Credit points: 4 cr (or 2 cr)
Place: Seminar room Y405 in the main building of HUT
Time: Thursdays 14-16, starting from September 20
Language: English or Finnish
Seminar course description
Many tasks related to natural language processing can be approached by
estimating a suitable model of the problem based on large amount of
sample data and the subsequent application of that model for solving
The purpose of this course is to learn about the statistical and
adaptive methods utilized in natural language modeling, and to form an
understanding of the problem and of some of the applications of
Many of the methods covered in this course are basic tools used for
the current reasearch in the Laboratory of Computer and Information
Science. The methods will include e.g. LSA and vector space models
and their extensions, SOM and clustering, hidden Markov models, and word
N-grams. Generative and descriptive, as well as hierachical and
structural and Bayesian modeling approaches will be discussed as well.
At least the following application areas of language modeling will be
presented: speech recognition (with morphological and topic modeling),
topic detection and analysis and information retrieval.
The course is based on selected chapters from the following book:
Daniel Jurafsky & James H. Martin: Speech and Language
and some related journal articles, such as:
- Jerome Bellegarda: Exploiting Latent Semantic Information in
Statistical language Modeling
- Yoshua Bengio & al.: A Neural Probabilistic Language Model
Some background on natural language processing is helpful, as well as
on mathematics, especially probability theory and adaptive methods.
However, our approach will be rather practical and for most methods it
will be more important to understand how they work in general than to
master the details.
Requirements for passing the course
To pass the course (4cr), you have to
To pass with distinction, the seminar talk, the handouts, and the
project work must each be very good, and 95% of the exercises
should be solved.
Without the project work the course can be completed to obtain two
- participate actively
- give a seminar talk on a book chapter or a journal article,
- generate one or two homework questions or exercises about your
assigned area for other people to solve,
- solve a set of exercises given during the seminar (at least 60%), and
- carry out a small project work, that is, a small-scale research project.
Signing up for the course
Preferably by WWWTopi
https://webtopi.hut.fi/ or by showing up at the first meeting on September
20th. If you cannot make it, send e-mail to firstname.lastname@example.org.
Relationship to other studies
At TKK the course is suited for the Language Technology major
(Kieliteknologian pää/sivuaine) and for studies in Information
Technology. It can be combined with T-61.281 Luonnollisten kielten
tilastollinen käsittely lectured in the spring.
Also students and staff from the KIT
(Kieliteknologian opetuksen verkosto) are welcome -- please sign
up in advance.
If preferred by the participants, the course can be arranged partially in
intensive format. This will be discussed in the first session.
Krista.Lagus@hut.fi (tel. 451 3276)
Mikko.Kurimo@hut.fi (tel. 451 5388)
Monday, 24-Sep-2001 12:25:14 EEST