T-61.6020 Special Course in Computer and Information Science II P : Popular Algorithms in Data Mining and Machine Learning

Lecturers	M.Sc. Nikolaj Tatti
Assistants	M.Sc. Sami Hanhijärvi
Credits (ECTS)	5
Semester	Spring 2008
Seminar sessions	On Wednesdays at 14-16 in Lecture Hall T5, computer science building, Konemiehentie 2, Otaniemi, Espoo. The first lecture is on January 23th, 2008.
Language	English
Web	http://www.cis.hut.fi/Opinnot/T-61.6020/
E-mail	t616020@cis.hut.fi

Introduction

The goal of this hands-on course is to introduce and to implement some of the most popular algorithms in Data Mining and Machine Learning. The algorithms deal with classification, clustering, link analysis and pattern discovery. See the topic list below.

First introductory lecture will be given on 23.01.2008. There will be no lecture on 30.01.08. The rest of the lectures will consist on presentations of the selected topic by the participants of the seminar.

Prerequisites

The course prerequisites include background in algorithms, probability theory, linear algebra and some optimization theory. Students are assumed to have rudimentary knowledge in Matlab and Python.

Requirements for passing the course

To pass the course a student must attend actively in the lectures, give one presentation (based on a material chosen from the list), and complete all the homeworks. There will about 10 homeworks, one homework for each algorithm. In each homework one should implement the corresponding algorithm and test with some the given data. The student should write a short report (of length abt. 2-3 pages) for each homework, explaining the algorithm and the results. Stubs for the algorithms (either in Matlab or Python, depending on the algorithm) will be provided so that students can concentrate on the core of the algorithm.

Homeworks

There are 10 homeworks, one homework for each algorithm. Homeworks are divided into two sets. The description of the homeworks can be downloaded from here: [First set], [Second set]. The deadline for the first homework is 26.3, and the deadline for the second homework is 21.5.

The stubs can be downloaded from here: [tar.gz], [zip].

The toy datasets can be downloaded from here: [tar.gz], [zip].

Presentations

The following table presents the dates for different topics and the respective lecturers. Please note that the lecture slides should be sent to the course address t616020@cis.hut.fi one week before the session so that the course organizers can comment and provide some hints conserning the slides.

Date	Name	Topic	Slides
23.1	Nikolaj Tatti	Introduction	[PDF]
6.2	Adam Gyenge	Decision trees	[PDF]
13.2	Lasse Kärkkäinen	Mixture models	[PDF]
20.2	Laszlo Kozma	kNN	[PDF]
27.2	Jarno Seppänen	EM	[PDF]
5.3	Luis Gabriel de Alba	AdaBoost	[PDF]
12.3	Lauri Lahti	APriori	[PPT] [PDF]
19.3	Joni Pajarinen	PageRank/HITS	[PDF]
2.4.	Ville Lämsä	K-Means/Spectral	[PDF]
9.4.	Oskar Kohonen	FP-Tree	[PDF]
16.4.	Stevan Keraudy	SVM	[PDF]
16.4.	Dusan Sovilj	gSpan	[PDF]
23.4.	Sami Virpioja	BIRCH	[PDF]
23.4.	Ari Nevalainen	PrefixSpan	[PDF]

Topics

The following is a selection of topis for the seminar on Popular Algorithms. The list is based heavily on Top 10 Algorithms in Data Mining from ICDM 2006. Each participant should select one topic for his/her presentation. Available topics will be assigned to participants, who don't already have a topic, at the first lecture. If we ran out of topics, you can e-mail us (t616020@cis.hut.fi) and we provide some additional candidate topics for the presentation.

Please, inform about your preferred topic and preferred timeslots for your presentation by sending an email to t616020@cis.hut.fi. Papers will be handed out in first come first serve fashion. At the moment, all available topics are taken.

Note that some papers are unavailable outside hut domain.

Decision Trees (ID3 and pruning methods)
K Nearest Neighbours (kNN)
Naive Bayes / Chow-Liu Tree Model (mixture models)
SVM
Expectation Maximization (EM)
APriori
FP-Tree
PageRank / HITS
K-Means / Spectral Clustering
AdaBoost
BIRCH (extra topic)
gSpan (extra topic)
PrefixSpan (extra topic)

You are at: CIS → T-61.6020 Special Course in Computer and Information Science II

Page maintained by ntatti@cc.hut.fi, last updated Wednesday, 23-Apr-2008 16:24:34 EEST