Tik-61.182 Informaatiotekniikan erikoiskurssi (4 ov) (L)
Information retrieval and statistical natural language processing
Processing of large text collections, unstructured natural language material, is nowadays carried out in so many different applications that the field itself hardly needs motivation. This course provides an overview of statistical and neural methods used in retrieving wanted information, and more generally in processing natural language texts. As is well known, there already exist applications that utilize automatic natural language processing methods in the Web, for example: search engines, automatic translation, categorization of texts, etc.
Natural language processing and information retrieval both have long research traditions. Most of the traditional methods have, however, been designed for small-scale data sets and are not feasible for the huge text corpora available presently. A new trend in the recent years has been to use statistical methods. Many of the methods are related to the statistical and computational intelligence methods that are research topics of the Laboratory of the Computer and Information Science, and some are even topics of the other courses of the laboratory. During this course we will study the backgound of statistical natural language processing and survey the state of the art in natural language applications of these methods.
No linguistic bacground is assumed; the emphasis of the course is not on the linguistic aspects but on the general-purpose statistical methods and their application potential in natural language processing.
The course is based on the following material:
1. Christopher D. Manning and Hinrich Schutze: Foundations of statistical natural language processing, MIT Press, 1999. Available at least in electronic bookstores.
2. Research articles, in particular of neural approaches.
Small project work
During the course each student carries out a small-scale research project on some aspect of information retrieval or statistical natural language processing. The project consists of a literature study and an experimental part. The goal of the literature study is to find out what is essential in the given material. In the experimental part a small-scale experimental or programming study is designed and carried out using the methods treated in the literature.
To pass the course (4cr), each student has to
Thursday, 20-Jan-2000 17:03:00 EET