T-122.103 Exercise project, fall 2004
To pass the course you must pass the examination and complete this
exercise project.
The purpose of this exercise project is to get hands on experience
and deeper understanding of some aspects of data mining. You will also
gain some experience of making experiments (finding the right
questions and suitable data set and interpreting the results).
To complete the project - depending on your particular choices -
you may need information that has not been discussed in the lectures.
If you have any questions please do not hesitate to contact the
assistant or the lecturer for hints, guidance or references.
The project work will be valid in all examinations for one year after
the original deadline.
General requirements
- The project should be completed by one person. However, discussing
it with others is encouraged.
- You have to submit a short project report in which you describe the
work that you have done and your conclusions.
- The project reports must be submitted on 17 January 2005, at
latest. The reports submitted after the deadline will be rejected. If
you have a very good reason that causes you to miss the deadline you
can request an extension. The extension must be requested before the
deadline.
- To pass the project you must fulfill the requirements given in the
"specific requirements" section.
- If you submit the project report in time, but don't pass, you will
have a chance to supplement your work after the deadline.
- Each project report should contain a section that comments on the
difficulty of the project and an estimate of the time used for
completing it.
- Please note that we need to be able to open and print the documents
you send us and if your project work includes program code we must be
able to run it. Please take this into account when planning your
project and ask us, if uncertain. You should contact us especially if
accessing your project work will require some proprietary software.
The preferred document formats are PostScript and PDF.
- The project reports should be submitted by email (
t122103@james.hut.fi
).
If email submission is not possible you can use e.g. (internal) mail
(Kai Puolamäki, PL 5400, 02015 TKK). The project reports should
contain your name, email address and full student number. If you
submit your work by email, please include the student number also to
the subject line.
Specific requirements
One of the following:
- Design a model for generated 0-1 data sets that takes into account
skewness in attribute distribution and allows one to incorporate at
least some type of dependencies between variables. Test some publicly
available method for finding frequent sets on your data, and report
the results.
What can you conclude from your experiments? How well do the results
generalize to the real world situations? Can you suggest any
improvements to the algorithms?
- Implement an algorithm for finding sequential episodes from
sequences of events, and test the method on generated data. Report the
results.
What can you conclude from your experiments? How well do the results
generalize to the real world situations? Can you suggest any
improvements to the algorithms?
- Write a 3-5 page summary on the paper R. Bayardo: Efficiently
Mining Long Patterns from Databases,
http://www.informatik.uni-trier.de/~ley/db/conf/sigmod/Bayardo98.html
If you are interested in making the project from a topic of your
own, please contact Kai Puolamäki at t122103@james.hut.fi
.