[an error occurred while processing this directive] [an error occurred while processing this directive]


You may freely use and redistribute the data set, with or without modification. Please cite the technical report, as you see appropriate, if you use or redistribute the data set:

Jarkko Salojärvi, Kai Puolamäki, Jaana Simola, Lauri Kovanen, Ilpo Kojo, Samuel Kaski. Inferring Relevance from Eye Movements: Feature Extraction. Helsinki University of Technology, Publications in Computer and Information Science, Report A82. 3 March 2005. Data set at http://www.cis.hut.fi/eyechallenge2005/

[The technical report as PDF]

Competition 1
File Description # of Assignments File size
c1info.txt Data description
c1_train.dat Training data set
1.4 M
c1_validate.dat Validation data set
0.5 M
validate1.tru Validation data set labels (published on 1 August 2005)
3 K
c1_test.dat Test data set (published on 1 August 2005)
0.7 M
test1.tru Test data set labels (published on 18 October 2005)
3.5 K
c1_data.zip All data presently available for Competition 1 0.8 M
Competition 2
File Description # of Assignments File size
c2info.txt Data description
c2_train.dat Training data set
5.1 M
c2_train.wrd Word locations for c2_train.dat 380 K
c2_validate.dat Validation data set
2.4 M
c2_validate.wrd Word locations for c2_validate.dat 160 K
validate2.tru Validation data set labels (published on 1 August 2005)
3 K
c2_test.dat Test data set
2.6 M
c2_test.wrd Word locations for c2_test.dat 165 K
test2.tru Test data set labels (published on 18 October 2005)
3.5 K
c2_data.zip All data presently available for Competition 2 3.3 M


The Challenge is divided into two competitions:

Competition 1 (preprocessed data)
A straight-forward classification task. We provide pre-computed feature vectors for each word in the eye movement trajectory, with class labels. The objective is to predict the class labels for the corresponding trajectory in the test data set. Test data set for Competition 1 was published on 1 August 2005 and the classification results must be submitted by 30 September 2005.
Competition 2 (raw data)
You can apply all your knowledge about reading behaviour and advanced time series modeling. We provide you with the raw eye movement data, used to extract the features in Competition 1, and the associated word coordinates. Test data set for Competition 2 will be published on 1 October and the results must be submitted by noon GMT on 14 October 2005.

The data set for each competition is divided into following parts, to be published as follows. The objective of the Challenge is to predict the classification labels (I, R, C) in the test set as accurately as possible.

Data set name Data set publication date Label publication date
Full training set Training set 1 March 2005
Validation set 1 March 2005 1 August 2005
Test set Competition 1 1 August 2005 After 14 October 2005
Competition 2 1 October 2005 After 14 October 2005
Full training set
The full training set consist of 50 assignments collected from 11 test subjects. Each assignment consists of a question followed by ten sentences (titles of news articles). One of the sentences is the correct answer to the question (C) and five of the sentences are irrelevant to the question (I). Four of the sentences are relevant to the question (R), but they do not answer it. The full training set is further divided into Contestants are encouraged to improve, evaluate and benchmark their method with the validation data set before submitting their final test set result. The labels of the validation data set will be published on 1 August 2005.
Test set
The test set consists of 180 assignments. The objective of the challenge is to predict the correct classification labels (I, R, C) in the test set based on the eye movements alone. All assignments in the test set have five irrelevant (I), four relevant (R) and one correct sentences (C) - just like the training set. All assignments in the test sets are unique. Assignments where the test subject gave a wrong answer have been excluded. The test assignments were collected from a subset of our 11 test subjects. The test set data may be statistically slightly different from the training and validation sets. For instance, in the training data set the distribution of the correct answers was balanced equally across the ten lines. In test data set the lines containing correct answers were chosen randomly and independently of the other assignments. Also the average title length may differ and the distribution of title topics (political, economical, etc) may also be be different.

More detailed information of the data format can be found in files c1info.txt and c2info.txt. Please see the technical report for information of eye movements, experimental setup, baseline methods and references:

Jarkko Salojärvi, Kai Puolamäki, Jaana Simola, Lauri Kovanen, Ilpo Kojo, Samuel Kaski. Inferring Relevance from Eye Movements: Feature Extraction. Helsinki University of Technology, Publications in Computer and Information Science, Report A82. 3 March 2005. [PDF]

Competition 1 (preprocessed data)

Competition 2 (raw data)

[an error occurred while processing this directive]