Courses in previous years: [ 1999 | 2000 | 2001 | 2002 | 2003 | 2004 ]

T-61.6040 Special Course in Computer and Information Science IV L:

Variable Selection for Regression

Lecturers	PhD (Eng.) Amaury Lendasse
Assistants	M.Sc. Elia Liitiäinen
Credits (ECTS)	6
Semester	Autumn 2006 (during periods I and II)
Seminar sessions	On Tuesdays at 14-16 in computer science building, Konemiehentie 2, Otaniemi, Espoo. The first session is on 19.9.2006 in lecture hall T4.
Language	English
Web	http://www.cis.hut.fi/Opinnot/T-61.6040/
E-mail	eliitiai (at) cc.hut.fi lendasse (at) hut.fi

Introduction

Nowadays, many machine learning problems involve the use of a large number of features. This might be the case, for example, in DNA and biomedical data analysis, in image processing, financial data mining, chemometrics, etc. In other cases, the number of features may be smaller, but of the same order of magnitude as the number of samples. In both cases, regression tasks are faced to the curse of dimensionality: overfitting easily appears, and in some cases the regression problem can become ill-posed (or not identifiable). The challenge is then to reduce the number of features, in order to improve the regression efficiency. Interpretability is often a major concern too, as a large number of features usually prevents any understanding of the underlying relationship.

Feature selection and dimension reduction includes two different ways of reducing the number of inputs of the regression model. First, inputs are selected among the original features; this is usually referred to as feature selection or input selection. Second, inputs can be built from the original features, by combining them in a linear or nonlinear way; this leads to dimension reduction. In this course, both feature selection and dimension reduction methods are covered.

The goal of feature selection and dimension reduction is twofold. First, reducing the number of input variables fights the curse of dimensionality, giving the possibility of increasing the regression generalization performances. Second, a reduced set of variables is of utmost importance in real applications as it allows an easier interpretation of the relationship between features and outputs.

Course format

Seminar course

Requirements for passing the course

Each student gives a presentation in the seminar. In addition, requirements include a project work and active participation in the lectures (one absence is allowed).

Schedule

Time	Lecturer and references	Topic	Slides
19.9.	Amaury Lendasse	Presentation of the course	slides,
26.9.	Qi Yu, Mikko Korpela [1],[2],[3]	Introduction to variable selection	slides1, slides2
3.10.	Heli Hiisilä [4],[5]	Model validation	slides
10.10	Antti Ajanki [6]	Variable selection for linear models	slides
17.10	Antti Sorjamaa [7],[8],[9]	Local linear methods for variable selection	slides
24.10	Dmitrij Lagutin, Elia Liitiäinen [10],[11]	Feature selection for MLP networks + Automatic Relevance Determination	slides1, slides2
31.10		No presentation
7.11	Jaakko Väyrynen [12],[13]	Genetic algorithm for variable selection	slides
14.11	Yoan Miché [14]	Mutual information for variable selection
21.11	Jussi Ahola [18]	PLS
28.11		No presentation
5.12		Discussion of projects

References

[1] J. Hao. Input selection using mutual information - applications to time series prediction. Master's thesis, Helsinki University of Technology, 2005.
[2] R. Kohavi and G. H. John. Wrappers for feature subset selection. Artifical Intelligence, 1997.
[3] I. Guyon, S. Gunn, M. Nikravesh and L. Zadeh. Feature Extraction, chapter Embedded Methods. Springer.
[4] A. Lendasse, G. Simon, v. Wertz and M. Verleysen. Fast bootstrap methodology for model selection. Neurocomputing, 2005.
[5] A. Lendasse. Testing neural models: how to use re-sampling techniques?
[6] B. Efron, T. Hastie, L. Johnstone and R. Tibshirani. Least angle regression. Annals of Statistics, 2004.
[7] A. Sorjamaa, N. Reyhani and A. Lendasse. Input and structure selection for k-nn approximator.
[8] A. Sorjamaa, A. Lendasse and M. Verleysen. Pruned lazy learning models for time series prediction.
[9] Chris Atkeson, A. Moore and S. Schaal. Locally weighted learning. AI Review, 11:11-73, April 1997.
[10] P. Leray and P. Gallinari. Feature selection with neural networks. Behaviormetrica, 1998.
[11] T. van Gestel, J. A. K. Suykens, B. de Moor and J. Vandewalle. Automatic relevance determination for least squares support vector machine classifiers. In M. Verleysen, editor, European Symposium on Artificial Neural Networks, 13-18, 2001.
[12] H. Vafaie. Robust feature selection algorithms. In Proc. 5th Intl. Conf. on Tools with Artificial Intelligence, 1993.
[13] C. R. Houck, J. A. Joines and M. G. Kay. A genetic algorithm for function optimization: a matlab implementation. Technical report, North Carolina State University, 1996.
[14] L. J. Herrera. Effective input variable selection for function approximation. In Lecture Notes on Computer Science.
[15] S. Perkins and J. Theiler. Online feature selection using grafting.
[16] S. Perkins, K. Lacker and J. Theiler. Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research, 2003.
[17] Amir Navot, Lavi Shpigelman, Naftali Tishby and Eilon Vaadia. Nearest neighbor based feature selection for regression and its application to neural activity. NIPS 2005.
[18] S. Wold, L. Eriksson, J. Trygg and N. Kettaneh. The PLS method - partial least squares projection to latent structures - and its applications in industrial RDP (research, development and production).
[19] A. Lendasse, F. Corona, J. Hao, N. Reyhani and M. Verleysen. Determination of the Mahalanobis matrix using nonparametric noise estimations. ESANN 2006.

Most of the references can be found from internet. The rest are available by request.

Project work

Information about the project is available here.

For more information, please send email to eliitiai (at) cc.hut.fi or lendasse (at) hut.fi.

T-61.6040 Special Course in Computer and Information Science IV L: Variable Selection for Regression