Tik-61.182 Informaatiotekniikan erikoiskurssi (4 ov) (L)

Prof. Samuel Kaski, Prof. Erkki Oja, MSc Krista Lagus
Semester: Spring 2000
Credit points: 4 cr
Place: Lecture hall T4 in the computer science building
Time: Thursdays 14-16, starting from January 20
Homepage: http://www.cis.hut.fi/Opinnot/Tik-61.182/

Information retrieval and statistical natural language processing

Processing of large text collections, unstructured natural language material, is nowadays carried out in so many different applications that the field itself hardly needs motivation. This course provides an overview of statistical and neural methods used in retrieving wanted information, and more generally in processing natural language texts. As is well known, there already exist applications that utilize automatic natural language processing methods in the Web, for example: search engines, automatic translation, categorization of texts, etc.

Natural language processing and information retrieval both have long research traditions. Most of the traditional methods have, however, been designed for small-scale data sets and are not feasible for the huge text corpora available presently. A new trend in the recent years has been to use statistical methods. Many of the methods are related to the statistical and computational intelligence methods that are research topics of the Laboratory of the Computer and Information Science, and some are even topics of the other courses of the laboratory. During this course we will study the backgound of statistical natural language processing and survey the state of the art in natural language applications of these methods.

No linguistic bacground is assumed; the emphasis of the course is not on the linguistic aspects but on the general-purpose statistical methods and their application potential in natural language processing.

The course is based on the following material:

1. Christopher D. Manning and Hinrich Schutze: Foundations of statistical natural language processing, MIT Press, 1999. Available at least in electronic bookstores.

2. Research articles, in particular of neural approaches.

Small project work

During the course each student carries out a small-scale research project on some aspect of information retrieval or statistical natural language processing. The project consists of a literature study and an experimental part. The goal of the literature study is to find out what is essential in the given material. In the experimental part a small-scale experimental or programming study is designed and carried out using the methods treated in the literature.

To pass the course (4cr), each student has to

  1. participate actively,
  2. give a seminar talk on a part of the book or some papers,
  3. prepare a brief summary of the material, in the form of clear transparencies, and
  4. make a project work, consisting of a programming or experimental task, and present the results in the seminar as well as write a short report of the project.
To pass with distinction, the seminar talk, the transparencies and the project work must each be very good.

To course main page

Thursday, 20-Jan-2000 17:03:00 EET