Exercise work 2007

To pass the course you must pass the examination and complete this exercise work.

To complete the exercise work you may need information that has not been discussed in the lectures. If you have any questions please do not hesitate to contact the assistant or the lecturer for hints, guidance or references.

The exercise work will be graded rejected or passed. A passed exercise work will be valid for one year after the original deadline.

If you want to get a grade at the same time as the December examination results you should also submit the exercise work by 15 January 2008. The grading will be announced at (the blue binder at) the notice board, and by an email to an address of the form, where 12345X is your student number.

General requirements

  1. The exercise work should be completed by one person. However, discussing it with others is encouraged.
  2. You have to submit a report in which you describe the work that you have done and your conclusions.
  3. The reports must be received by the examiner on 15 January 2008, at latest. The reports received after the deadline will be rejected. If you have a very good reason that causes you to miss the deadline you can request an extension. The extension must be requested at least one office day before the deadline.
  4. To pass the exercise work you must fulfill the requirements given in the "specific requirements" section.
  5. If you submit the exercise work report in time and you have honestly tried to satisfy the requirements (for example, your submission is not essentially empty), but don't pass, you will be given instructions and a new deadline on how to supplement your work after the original deadline.
  6. Accepted languages are Finnish, Swedish and English.
  7. There are no strict formatting rules, nor preferred typesetting or word processing system.
  8. Each exercise work report should contain a section that comments on the difficulty of the project and an estimate of the time used for completing it.
  9. The exercise work reports should be submitted by email ( in PDF format. If PDF or email submission is not possible you can submit a printed copy by (internal) mail (Niko Vuokko, PL 5400, 02015 TKK). The exercise work reports should contain your name, email address and full student number. If you submit your work by email, please include the student number also to the subject line.

Specific requirements

Write 3-5 page summary of one of the following papers from the KDD'07 conference:

  1. Deepak Agarwal, Dhiman Barman, Dimitrios Gunopulos, Neal E. Young, Flip Korn, Divesh Srivastava: Efficient and effective explanation of change in hierarchical summaries. 6-15
  2. Deepak Agarwal, Andrei Z. Broder, Deepayan Chakrabarti, Dejan Diklic, Vanja Josifovski, Mayssam Sayyadian: Estimating rates of rare events at multiple resolutions. 16-25
  3. Charu C. Aggarwal, Philip S. Yu: On string classification in data streams. 36-45
  4. Aron Culotta, Michael Wick, Robert Hall, Matthew Marzilli, Andrew McCallum: Canonicalization of database records using adaptive similarity measures. 201-209
  5. Kaustav Das, Jeff G. Schneider: Detecting anomalous records in categorical datasets. 220-229
  6. Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Huan Liu, Philip S. Yu: Time-dependent event hierarchy construction. 300-309

You can get the articles from this link when inside the TKK network (probably otherwise too).


Implement either the count-min data structure or the k-means++ algorithm, design and perform some in your view interesting experiments, and report them in about 3-5 pages.
You can use a language of your own choice, but Matlab is probably the easiest way to do this.
Include your source files + any required non-trivial compilation instructions (preferably a makefile) in the report separately as attachments.

