Laboratory of Computer and Information Science

Tik-61.181 Special Course in Information Science I

Lecturer: professor Olli Simula
Assistents: Esa Alhoniemi, Juha Vesanto
Semester: autumn 1999
Credit points: 4 cr
Place: seminar room B333 in the computer science building
Time: Wednesdays 14-16, starting from September 15th
Language: english
Data Preparation for Data Mining

In the seminar, one of the basic problems of data mining is investigated: data preparation/preprocessing. The primary course book is a new book by Dorian Pyle "Data Preparation for Data Mining". The book costs about $40 + posting in the Amazon.

Practical arrangements

See the beginning of the page.

Course assistants are Juha Vesanto and Esa Alhoniemi.

To pass the course (4 cr) you have to:

To pass the course "with distinction", at least 14 of the home exercises should be solved, and the presentation and practical exercise should be very good.


Date Name of presenter Material Subject
15.9. Juha Vesanto Introduction to the course
MS-PowerPoint slides
22.9. Juha Vesanto A good presentation
MS-PowerPoint slides
Juha Vesanto Chapter 1 Introduction to data mining
MS-PowerPoint slides
Esa Alhoniemi Example of a data mining project
MS-PowerPoint slides
29.9. Jani Mattsson Chapter 2 Types of measurements
MS-PowerPoint slides
6.10. Markku Ursin Chapter 3 The process of data preparation
MS-PowerPoint slides
Markku Roiha Chapter 4 Basic preparation
MS-PowerPoint slides
13.10. Peng Chengyuan Chapter 5.1-5.4 Sampling and variability
MS-PowerPoint slides
Mika Raivio Chapter 5.5 - 5.9 Confidence
MS-PowerPoint slides
20.10. Ville Makkonen Chapter 6.1-6.2 Handling nonnumerical variables (1)
MS-PowerPoint slides
Kenrick Bingham Chapter 6.3-6.6 Handling nonnumerical variables (2)
MS-PowerPoint slides
Vuokko Vuori Additional material Multidimensional scaling
MS-PowerPoint slides
27.10. Yuan Zhijian Additional material Projection methods
Markus Koskela Chapter 7 Normalizing and redistributing variables
MS-PowerPoint slides
Jukka Parviainen Chapter 8 + additional material Missing and empty values
MS-PowerPoint slides
3.11. Ella Bingham Chapter 9.1-9.5 Series variables (1)
PostScript slides
Simona Malaroiu Chapter 9.6-9.9 Series variables (2)
Olli Saarela Heikki Jokinen, "Disturbance Detection and Suppression in Signal Preprocessing" Series variables (3)
MS-PowerPoint slides
10.11. Ville Viitaniemi Chapter 10.1-10.4 Sparse data, data compression (1)
MS-PowerPoint slides
Markus Siponen Chapter 10.5-10.9 Sparse data, data compression (2)
MS-PowerPoint slides
Mikko Syrjälahti Chapman et al, "CRISP-DM Process Model" CRISP model for data mining
zipped PostScript slides
17.11. no lecture
24.11. Ari Niinistö Chapter 11.1-11.4 Data survey (1)
MS-PowerPoint slides
Martti Kesäniemi Chapter 11.5-11.9 Data survey (2)
MS-PowerPoint slides
Jussi Ahola Chapter 11; supplemental material Data survey (3)
MS-PowerPoint slides
Johan Himberg Additional material Clustering
MS-PowerPoint slides
Heikki Mannila Guest lecture in T4 at 16:15- Sequences and episodes: A case study of data mining in telecommunications
1.12. Jaakko Leppänen Additional material Graphical examination of data
MS-PowerPoint slides
Mihai Enescu Additional material Data warehousing
PostScript version of the report
16.1.2000 Deadline for home exercises and practical assignement
19.1.2000 Esa Alhoniemi, Juha Vesanto Post-seminar lecture on the results of home exercises

Additional materials, available from the assistents, are listed below. Feel free to use your own sources.

Home exercises, practical assignement

Exercises are available from the course exercises page.

The deadline for the home exercises and the practical assignement is January 16th 2000. They can be returned to the assistants (room B313), lecturer, or the secretary of the Laboratory of Computer and Information Science.

There will be a separate lecture on January 19th 2000, when the (correct) results of home exercises and the practical assignment will be presented and discussed.

December 1st, 1999
Juha Vesanto