 
 
Courses in previous years 1998, 1997, 1996
In the seminar, one of the basic problems of data mining is investigated: data preparation/preprocessing. The primary course book is a new book by Dorian Pyle "Data Preparation for Data Mining". The book costs about $40 + posting in the Amazon.
See the beginning of the page.
Course assistants are Juha Vesanto and Esa Alhoniemi.
To pass the course (4 cr) you have to:
| Date | Name of presenter | Material | Subject | 
|---|---|---|---|
| 15.9. | Juha Vesanto | Introduction to the course MS-PowerPoint slides | |
| 22.9. | Juha Vesanto | A good presentation MS-PowerPoint slides | |
| Juha Vesanto | Chapter 1 | Introduction to data mining MS-PowerPoint slides | |
| Esa Alhoniemi | Example of a data mining project MS-PowerPoint slides | ||
| 29.9. | Jani Mattsson | Chapter 2 | Types of measurements MS-PowerPoint slides | 
| 6.10. | Markku Ursin | Chapter 3 | The process of data preparation MS-PowerPoint slides | 
| Markku Roiha | Chapter 4 | Basic preparation MS-PowerPoint slides | |
| 13.10. | Peng Chengyuan | Chapter 5.1-5.4 | Sampling and variability MS-PowerPoint slides | 
| Mika Raivio | Chapter 5.5 - 5.9 | Confidence MS-PowerPoint slides | |
| 20.10. | Ville Makkonen | Chapter 6.1-6.2 | Handling nonnumerical variables (1) MS-PowerPoint slides | 
| Kenrick Bingham | Chapter 6.3-6.6 | Handling nonnumerical variables (2) MS-PowerPoint slides | |
| Vuokko Vuori | Additional material | Multidimensional scaling MS-PowerPoint slides | |
| 27.10. | Yuan Zhijian | Additional material | Projection methods | 
| Markus Koskela | Chapter 7 | Normalizing and redistributing variables MS-PowerPoint slides | |
| Jukka Parviainen | Chapter 8 + additional material | Missing and empty values MS-PowerPoint slides | |
| 3.11. | Ella Bingham | Chapter 9.1-9.5 | Series variables (1) PostScript slides | 
| Simona Malaroiu | Chapter 9.6-9.9 | Series variables (2) | |
| Olli Saarela | Heikki Jokinen, "Disturbance Detection and Suppression in Signal Preprocessing" | Series variables (3) MS-PowerPoint slides | |
| 10.11. | Ville Viitaniemi | Chapter 10.1-10.4 | Sparse data, data compression (1) MS-PowerPoint slides | 
| Markus Siponen | Chapter 10.5-10.9 | Sparse data, data compression (2) MS-PowerPoint slides | |
| Mikko Syrjälahti | Chapman et al, "CRISP-DM Process Model" | CRISP model for data mining zipped PostScript slides | |
| 17.11. | no lecture | ||
| 24.11. | Ari Niinistö | Chapter 11.1-11.4 | Data survey (1) MS-PowerPoint slides | 
| Martti Kesäniemi | Chapter 11.5-11.9 | Data survey (2) MS-PowerPoint slides | |
| Jussi Ahola | Chapter 11; supplemental material | Data survey (3) MS-PowerPoint slides | |
| Johan Himberg | Additional material | Clustering MS-PowerPoint slides | |
| Heikki Mannila | Guest lecture in T4 at 16:15- | Sequences and episodes: A case study of data mining in telecommunications | |
| 1.12. | Jaakko Leppänen | Additional material | Graphical examination of data MS-PowerPoint slides | 
| written report | Mihai Enescu | Additional material | Data warehousing PostScript version of the report | 
| 16.1.2000 | Deadline for home exercises and practical assignement | ||
| 19.1.2000 | Esa Alhoniemi, Juha Vesanto | Post-seminar lecture on the results of home exercises | |
Additional materials, available from the assistents, are listed below. Feel free to use your own sources.
Exercises are available from the course exercises page.
The deadline for the home exercises and the practical assignement is January 16th 2000. They can be returned to the assistants (room B313), lecturer, or the secretary of the Laboratory of Computer and Information Science.
There will be a separate lecture on January 19th 2000, when the (correct) results of home exercises and the practical assignment will be presented and discussed.