Informaatiotekniikan laboratorio > Opetus > T-122.103 > Harjoitustyö 2004

T-122.103 Exercise project, fall 2004

To pass the course you must pass the examination and complete this exercise project.

The purpose of this exercise project is to get hands on experience and deeper understanding of some aspects of data mining. You will also gain some experience of making experiments (finding the right questions and suitable data set and interpreting the results).

To complete the project - depending on your particular choices - you may need information that has not been discussed in the lectures. If you have any questions please do not hesitate to contact the assistant or the lecturer for hints, guidance or references.

The project work will be valid in all examinations for one year after the original deadline.

General requirements

  1. The project should be completed by one person. However, discussing it with others is encouraged.
  2. You have to submit a short project report in which you describe the work that you have done and your conclusions.
  3. The project reports must be submitted on 17 January 2005, at latest. The reports submitted after the deadline will be rejected. If you have a very good reason that causes you to miss the deadline you can request an extension. The extension must be requested before the deadline.
  4. To pass the project you must fulfill the requirements given in the "specific requirements" section.
  5. If you submit the project report in time, but don't pass, you will have a chance to supplement your work after the deadline.
  6. Each project report should contain a section that comments on the difficulty of the project and an estimate of the time used for completing it.
  7. Please note that we need to be able to open and print the documents you send us and if your project work includes program code we must be able to run it. Please take this into account when planning your project and ask us, if uncertain. You should contact us especially if accessing your project work will require some proprietary software. The preferred document formats are PostScript and PDF.
  8. The project reports should be submitted by email (t122103@james.hut.fi). If email submission is not possible you can use e.g. (internal) mail (Kai Puolamäki, PL 5400, 02015 TKK). The project reports should contain your name, email address and full student number. If you submit your work by email, please include the student number also to the subject line.

Specific requirements

One of the following:

  1. Design a model for generated 0-1 data sets that takes into account skewness in attribute distribution and allows one to incorporate at least some type of dependencies between variables. Test some publicly available method for finding frequent sets on your data, and report the results.
    What can you conclude from your experiments? How well do the results generalize to the real world situations? Can you suggest any improvements to the algorithms?
  2. Implement an algorithm for finding sequential episodes from sequences of events, and test the method on generated data. Report the results.
    What can you conclude from your experiments? How well do the results generalize to the real world situations? Can you suggest any improvements to the algorithms?
  3. Write a 3-5 page summary on the paper R. Bayardo: Efficiently Mining Long Patterns from Databases, http://www.informatik.uni-trier.de/~ley/db/conf/sigmod/Bayardo98.html

If you are interested in making the project from a topic of your own, please contact Kai Puolamäki at t122103@james.hut.fi.



http://www.cis.hut.fi/Opinnot/T-61.5060/2004/harjoitustyo.shtml
t122103@james.hut.fi
Friday, 20-Aug-2004 19:32:35 EEST