T-61.184 Statistical (Adaptive) Language Modeling
Special course on Information Technology IV (4 ov, L)

Lecturers: D.Sc.(Tech) Krista Lagus and Prof. Mikko Kurimo
Semester: Autumn 2001
Credit points: 4 cr (or 2 cr)
Place: Seminar room Y405 in the main building of HUT
Time: Thursdays 14-16, starting from September 20
Language: English or Finnish
Homepage: http://www.cis.hut.fi/Opinnot/Tik-61.184/

Seminar course description

Many tasks related to natural language processing can be approached by estimating a suitable model of the problem based on large amount of sample data and the subsequent application of that model for solving the problem. The purpose of this course is to learn about the statistical and adaptive methods utilized in natural language modeling, and to form an understanding of the problem and of some of the applications of language models.

Many of the methods covered in this course are basic tools used for the current reasearch in the Laboratory of Computer and Information Science. The methods will include e.g. LSA and vector space models and their extensions, SOM and clustering, hidden Markov models, and word N-grams. Generative and descriptive, as well as hierachical and structural and Bayesian modeling approaches will be discussed as well.

At least the following application areas of language modeling will be presented: speech recognition (with morphological and topic modeling), topic detection and analysis and information retrieval.

The course is based on selected chapters from the following book: Daniel Jurafsky & James H. Martin: Speech and Language Processing and some related journal articles, such as:

  • Jerome Bellegarda: Exploiting Latent Semantic Information in Statistical language Modeling
  • Yoshua Bengio & al.: A Neural Probabilistic Language Model

Prerequisites

Some background on natural language processing is helpful, as well as on mathematics, especially probability theory and adaptive methods. However, our approach will be rather practical and for most methods it will be more important to understand how they work in general than to master the details.

Requirements for passing the course

To pass the course (4cr), you have to
  1. participate actively
  2. give a seminar talk on a book chapter or a journal article,
  3. generate one or two homework questions or exercises about your assigned area for other people to solve,
  4. solve a set of exercises given during the seminar (at least 60%), and
  5. carry out a small project work, that is, a small-scale research project.
To pass with distinction, the seminar talk, the handouts, and the project work must each be very good, and 95% of the exercises should be solved. Without the project work the course can be completed to obtain two credits.

Signing up for the course

Preferably by WWWTopi https://webtopi.hut.fi/ or by showing up at the first meeting on September 20th. If you cannot make it, send e-mail to krista.lagus@hut.fi.

Relationship to other studies

At TKK the course is suited for the Language Technology major (Kieliteknologian pää/sivuaine) and for studies in Information Technology. It can be combined with T-61.281 Luonnollisten kielten tilastollinen käsittely lectured in the spring.

Also students and staff from the KIT (Kieliteknologian opetuksen verkosto) are welcome -- please sign up in advance. If preferred by the participants, the course can be arranged partially in intensive format. This will be discussed in the first session.

More information

Krista.Lagus@hut.fi (tel. 451 3276)
Mikko.Kurimo@hut.fi (tel. 451 5388)


http://www.cis.hut.fi/Opinnot/T-61.6040/s01/kurssiesite.shtml
t61184@mail.cis.hut.fi
Monday, 24-Sep-2001 12:25:14 EEST