Courses in previous years: [ 1999 | 2000 | 2001 | 2002 | 2003 | 2004 ]
Lecturers | PhD (Eng.) Amaury Lendasse |
---|---|
Assistants | M.Sc. Elia Liitiäinen |
Credits (ECTS) | 6 |
Semester | Autumn 2006 (during periods I and II) |
Seminar sessions | On Tuesdays at 14-16 in computer science
building, Konemiehentie 2, Otaniemi, Espoo. The first session is on 19.9.2006 in lecture hall T4. |
Language | English |
Web | http://www.cis.hut.fi/Opinnot/T-61.6040/ |
eliitiai (at) cc.hut.fi lendasse (at) hut.fi |
Nowadays, many machine learning problems involve the use of a large number of features. This might be the case, for example, in DNA and biomedical data analysis, in image processing, financial data mining, chemometrics, etc. In other cases, the number of features may be smaller, but of the same order of magnitude as the number of samples. In both cases, regression tasks are faced to the curse of dimensionality: overfitting easily appears, and in some cases the regression problem can become ill-posed (or not identifiable). The challenge is then to reduce the number of features, in order to improve the regression efficiency. Interpretability is often a major concern too, as a large number of features usually prevents any understanding of the underlying relationship. Feature selection and dimension reduction includes two different ways of reducing the number of inputs of the regression model. First, inputs are selected among the original features; this is usually referred to as feature selection or input selection. Second, inputs can be built from the original features, by combining them in a linear or nonlinear way; this leads to dimension reduction. In this course, both feature selection and dimension reduction methods are covered. The goal of feature selection and dimension reduction is twofold. First, reducing the number of input variables fights the curse of dimensionality, giving the possibility of increasing the regression generalization performances. Second, a reduced set of variables is of utmost importance in real applications as it allows an easier interpretation of the relationship between features and outputs. |
Seminar course
Each student gives a presentation in the seminar. In addition, requirements include a project work and active participation in the lectures (one absence is allowed). |
Time | Lecturer and references | Topic | Slides |
---|---|---|---|
19.9. | Amaury Lendasse | Presentation of the course | slides, |
26.9. | Qi Yu, Mikko Korpela [1],[2],[3] | Introduction to variable selection | slides1, slides2 |
3.10. | Heli Hiisilä [4],[5] | Model validation | slides |
10.10 | Antti Ajanki [6] | Variable selection for linear models | slides |
17.10 | Antti Sorjamaa [7],[8],[9] | Local linear methods for variable selection | slides |
24.10 | Dmitrij Lagutin, Elia Liitiäinen [10],[11] | Feature selection for MLP networks + Automatic Relevance Determination | slides1, slides2 |
31.10 | No presentation | ||
7.11 | Jaakko Väyrynen [12],[13] | Genetic algorithm for variable selection | slides |
14.11 | Yoan Miché [14] | Mutual information for variable selection | |
21.11 | Jussi Ahola [18] | PLS | |
28.11 | No presentation | ||
5.12 | Discussion of projects | |
[1] J. Hao. Input selection using mutual information - applications to time series prediction.
Master's thesis, Helsinki University of Technology, 2005. |
For more information, please send email to eliitiai (at) cc.hut.fi or
lendasse (at) hut.fi.
You are at: CIS → T-61.6040 Special Course in Computer and Information Science IV
Page maintained by eliitiai@cc.hut.fi, last updated Monday, 29-Jan-2007 10:21:00 EET