Laboratory of Computer and Information Science / Neural Networks Research Centre

Helsinki University of Technology → Department of Computer Science and Engineering →
Laboratory of Computer and Information Science → Teaching →
T-61.3050 Machine Learning: Basic Principles → 2007 → Project - Web spam detection, Submission instructions

Project - Web spam detection - Submission instructions

Save your predictions in a text file. Please make sure the file is encoded in either latin-1 (preferred) or utf-8. This shouldn't be a problem if you use one of the university computers to prepare the file.

The contents of the file are to be formatted as follows: predicted_class predicted_spammicity predicted_class predicted_spammicity
... predicted_class predicted_spammicity

Column values should be separated using whitespace, do NOT use commas or other characters as a delimiter. I.e., if you predict the host as normal with a spammicity score of, say, 0.4, you use the line normal 0.4

Formatting instructions

Send both the file containing the predictions and the preliminary version of your report as email attachments to the course email address ( The 'Subject' field should be formatted as:

mlproject2007, predictions, 12345X
where 12345X is the student number of the corresponding author of the report. Just pick either of the group members to act in this role. Ideally he/she also sends the email.

In the email text, list the names, email addresses and student numbers of both group members. Also indicate in the email if you'd rather not present your work for the other students on 30 November.

The name of the file containing the predictions must be of the form:

where 12345X is the student number of the corresponding author. It is important that this number is the same as the one appearing in the Subject-field of the email!

The name of the preliminary report must be of the form:

where 12345X is again the student number of the corresponding author. Note that we strongly prefer pdf! Files in MS Word format are accepted, but it is the responsibility of the students to make sure that they print correctly with Open Office. Also, check that the pdf is not nonsense. This might happen with some of the word-to-pdf converters out there. Usually a good indicator is if the size of the pdf is several megabytes even though the document contains only text and a few graphs. (If you prepare the report using Latex, it doesn't matter if the fonts appear slightly "blurred" when the pdf is viewed on a computer. As long as the report looks fine when printed everything is ok.)

You are at: CISTeachingT-61.30502007 → Machine learning project, instructions

Page maintained by, last updated Friday, 09-Nov-2007 15:31:52 EET