Courses in previous years: [ 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 ]

Näitä sivuja ei päivitetä enää. Ole hyvä ja katso tietojenkäsittelytieteen laitoksen WWW-sivuja: http://ics.tkk.fi/fi/studies/.

These pages are not any more updated. Please, see web pages of Department of Information and Computer Science (ICS): http://ics.tkk.fi/en/studies/.

Näitä sivuja ei päivitetä enää. Ole hyvä ja katso tietojenkäsittelytieteen laitoksen WWW-sivuja: http://ics.tkk.fi/fi/studies/.

These pages are not any more updated. Please, see web pages of Department of Information and Computer Science (ICS): http://ics.tkk.fi/en/studies/.

T-61.6050 Special Course in Computer and Information Science V L:

Nonlinear Dimensionality Reduction (6 cr ECTS)

Lecturers	Amaury Lendasse, Francesco Corona
Assistant	Kristian Nybo
Credits (ECTS)	6
Semester	Autumn 2007 (during periods I and II)
Seminar sessions	Tuesdays at 2 PM in room T4, starting 11.9
Language	English
Web	http://www.cis.hut.fi/Opinnot/T-61.6050/
Registration
E-mail	christian name dot surname at tkk dot fi

Introduction

Methods of dimensionality reduction are innovative and important tools in the fields of data analysis, data mining and machine learning. They provide a way to understand and visualize the structure of complex data sets. Traditional methods like principal component analysis and classical metric multidimensional scaling suffer from being based on linear models. Until recently, very few methods were able to reduce the data dimensionality in a nonlinear way. However, since the late nineties, many new methods have been developed and nonlinear dimensionality reduction, also called manifold learning, has become a hot topic. New advances that account for this rapid growth are e.g. the use of graphs to represent the manifold topology, and the use of new metrics, like the geodesic distance. In addition, new optimization schemes, based on kernel techniques and spectral decomposition, have lead to spectral embedding, which encompasses many of the recently developed methods.

This course describes existing and advanced methods to reduce the dimensionality of numerical databases. For each method, the description starts from intuitive ideas, develops the necessary mathematical details and ends by outlining the algorithmic implementation. Methods are compared with each other with the help of different illustrative examples.

The purpose of the course is to summarize clear facts and ideas about well-known methods as well as recent developments in the topic of nonlinear dimensionality reduction. With this goal in mind, methods are all described from a unifying point of view, in order to highlight their respective strengths and shortcomings.

Requirements for passing the course

In order to pass the course, each student must give a presentation in the seminar, participate actively, and complete a coursework project.

Material

We use the book

Nonlinear Dimensionality Reduction
Series: Information Science and Statistics
Lee, John A.; Verleysen, Michel
2007, approx. 300 p., hardcover
ISBN: 978-0-387-39350-6

Schedule

Time	Lecturer	Topic	Slides
11.9.	Amaury Lendasse	High-Dimensional Data
18.9.	Elia Liitiäinen	Characteristics of an Analysis Method	pdf
25.9.	Kristian Nybo	Estimation of the Intrinsic Dimension	pdf
2.10.	Markus Ojala	Distance Preservation I	pdf
9.10.	Niko Vuokko	Distance Preservation II	pdf
16.10.	Amaury Lendasse	PROJECT	ppt, pdf
23.10.	Laszlo Kozma and Dusan Sovilj	Topology Preservation I	Laszlo, Dusan
6.11.	Antti Sorjamaa and Yoan Miche	Topology Preservation II	Antti, Yoan
13.11.	Andrey Ermolov	Method Comparisons	pdf
20.11.	Emil Eirola and Lauri Oksanen	Conclusions	Emil, Lauri

Project

The reports should be handed in by email to both lecturers and the assistant no later than at 3.45 PM on the 21st of December. The reports should be written using LaTeX and the ESTSP template (tex file, cls file). The maximum length is 12 pages. For further information on the project, see Amaury Lendasse's slides above.

You can download the latest version of John Lee's toolbox here. Get the LSSVM toolbox here. Emil Eirola has also provided a simple example of using the LSSVM toolbox.

A bug in John Lee's toolbox

Markus Ojala reported a bug and a possible workaround:

There is a bug with John Lees' NLDR software "NLPm". One cannot load a validation set with command "iv": Interpolate validation set. It tries to load only a binary file, and we didn't found any way to save the files in appropriate binary format. This is needed at least for datasets Chemometrics and Time Series.

However, there is a workaround: Perform the mapping with the learning set and save the mapping with command "sm". Modify your test data to have the same size as the learning data by copying, for example, the last point the correct number of times. After that you can load the test data and the previous mapping file, and effectively cheat the software. Then the interpolation of the test set is done by "il". Without modifying the size of the test data the software doesn't allow to load it. After the projection just remove the copies of the last point.

Laszlo Kozma pointed out that the windows version of the toolbox, available on John Lee's website, does not have this bug.

T-61.6050 Special Course in Computer and Information Science V L: Nonlinear Dimensionality Reduction (6 cr ECTS)