Courses in previous years 1998, 1997, 1996
In the seminar, one of the basic problems of data mining is investigated: data preparation/preprocessing. The primary course book is a new book by Dorian Pyle "Data Preparation for Data Mining". The book costs about $40 + posting in the Amazon.
See the beginning of the page.
Course assistants are Juha Vesanto and Esa Alhoniemi.
To pass the course (4 cr) you have to:
Date | Name of presenter | Material | Subject |
---|---|---|---|
15.9. | Juha Vesanto | Introduction to the course
MS-PowerPoint slides | |
22.9. | Juha Vesanto | A good presentation
MS-PowerPoint slides | |
Juha Vesanto | Chapter 1 | Introduction to data mining
MS-PowerPoint slides | |
Esa Alhoniemi | Example of a data mining project
MS-PowerPoint slides | ||
29.9. | Jani Mattsson | Chapter 2 | Types of measurements
MS-PowerPoint slides |
6.10. | Markku Ursin | Chapter 3 | The process of data preparation
MS-PowerPoint slides |
Markku Roiha | Chapter 4 | Basic preparation
MS-PowerPoint slides | |
13.10. | Peng Chengyuan | Chapter 5.1-5.4 | Sampling and variability
MS-PowerPoint slides |
Mika Raivio | Chapter 5.5 - 5.9 | Confidence
MS-PowerPoint slides | |
20.10. | Ville Makkonen | Chapter 6.1-6.2 | Handling nonnumerical variables (1)
MS-PowerPoint slides |
Kenrick Bingham | Chapter 6.3-6.6 | Handling nonnumerical variables (2)
MS-PowerPoint slides | |
Vuokko Vuori | Additional material | Multidimensional scaling
MS-PowerPoint slides | |
27.10. | Yuan Zhijian | Additional material | Projection methods |
Markus Koskela | Chapter 7 | Normalizing and redistributing variables
MS-PowerPoint slides | |
Jukka Parviainen | Chapter 8 + additional material | Missing and empty values
MS-PowerPoint slides | |
3.11. | Ella Bingham | Chapter 9.1-9.5 | Series variables (1)
PostScript slides |
Simona Malaroiu | Chapter 9.6-9.9 | Series variables (2) | |
Olli Saarela | Heikki Jokinen, "Disturbance Detection and Suppression in Signal Preprocessing" | Series variables (3)
MS-PowerPoint slides | |
10.11. | Ville Viitaniemi | Chapter 10.1-10.4 | Sparse data, data compression (1)
MS-PowerPoint slides |
Markus Siponen | Chapter 10.5-10.9 | Sparse data, data compression (2)
MS-PowerPoint slides | |
Mikko Syrjälahti | Chapman et al, "CRISP-DM Process Model" | CRISP model for data mining
zipped PostScript slides | |
17.11. | no lecture | ||
24.11. | Ari Niinistö | Chapter 11.1-11.4 | Data survey (1)
MS-PowerPoint slides |
Martti Kesäniemi | Chapter 11.5-11.9 | Data survey (2)
MS-PowerPoint slides | |
Jussi Ahola | Chapter 11; supplemental material | Data survey (3)
MS-PowerPoint slides | |
Johan Himberg | Additional material | Clustering
MS-PowerPoint slides | |
Heikki Mannila | Guest lecture in T4 at 16:15- | Sequences and episodes: A case study of data mining in telecommunications | |
1.12. | Jaakko Leppänen | Additional material | Graphical examination of data
MS-PowerPoint slides |
written report | Mihai Enescu | Additional material | Data warehousing
PostScript version of the report |
16.1.2000 | Deadline for home exercises and practical assignement | ||
19.1.2000 | Esa Alhoniemi, Juha Vesanto | Post-seminar lecture on the results of home exercises |
Additional materials, available from the assistents, are listed below. Feel free to use your own sources.
Exercises are available from the course exercises page.
The deadline for the home exercises and the practical assignement is January 16th 2000. They can be returned to the assistants (room B313), lecturer, or the secretary of the Laboratory of Computer and Information Science.
There will be a separate lecture on January 19th 2000, when the (correct) results of home exercises and the practical assignment will be presented and discussed.