[an error occurred while processing this directive]
[an error occurred while processing this directive]
Datasets
You may freely use and redistribute the data set, with or without
modification.
Please cite the technical report, as you see
appropriate, if you use or redistribute
the data set:
Jarkko Salojärvi, Kai Puolamäki, Jaana Simola, Lauri Kovanen, Ilpo
Kojo,
Samuel Kaski. Inferring Relevance from Eye Movements: Feature
Extraction. Helsinki University of Technology, Publications in Computer
and Information Science, Report A82. 3 March 2005. Data set at
http://www.cis.hut.fi/eyechallenge2005/
[The technical report as PDF]
Competition 1 |
File |
Description |
# of Assignments |
File size |
c1info.txt |
Data description |
|
c1_train.dat |
Training data set |
336 |
1.4 M |
c1_validate.dat |
Validation data set |
149 |
0.5 M |
validate1.tru |
Validation data set labels (published on 1
August 2005) |
149 |
3 K |
c1_test.dat |
Test data set (published on 1 August 2005) |
180 |
0.7 M |
test1.tru |
Test data set labels (published on 18
October 2005) |
180 |
3.5 K |
c1_data.zip |
All data presently available for Competition 1 |
|
0.8 M |
Competition 2 |
File |
Description |
# of Assignments |
File size |
c2info.txt |
Data description |
|
c2_train.dat |
Training data set |
336 |
5.1 M |
c2_train.wrd |
Word locations for c2_train.dat |
|
380 K |
c2_validate.dat |
Validation data set |
149 |
2.4 M |
c2_validate.wrd |
Word locations for c2_validate.dat |
|
160 K |
validate2.tru |
Validation data set labels (published on 1
August 2005) |
149 |
3 K |
c2_test.dat |
Test data set |
180 |
2.6 M |
c2_test.wrd |
Word locations for c2_test.dat |
|
165 K |
test2.tru |
Test data set labels (published on 18
October 2005) |
180 |
3.5 K |
c2_data.zip |
All data presently available for Competition 2 |
|
3.3 M |
Instructions
The Challenge is divided into two competitions:
- Competition 1 (preprocessed data)
- A straight-forward classification task. We provide
pre-computed feature vectors for each word in the eye movement trajectory, with
class labels. The objective is to predict the class labels for the corresponding
trajectory in the test data set. Test data set for Competition 1 was published on 1 August
2005 and the classification results must be submitted by 30 September 2005.
- Competition 2 (raw data)
- You can apply all your knowledge about reading
behaviour and advanced time series modeling. We provide you with the raw eye
movement data, used to extract the features in Competition 1, and the
associated word coordinates. Test data set for Competition 2 will be published
on 1 October and the results must be submitted by noon GMT on
14 October 2005.
The data set for each competition is divided into following parts,
to be published as follows. The objective of the Challenge is to
predict the classification labels (I, R, C) in the test set as accurately as possible.
Data set name |
Data set publication date |
Label publication date |
Full training set |
Training set |
1 March 2005 |
Validation set |
1 March 2005 |
1 August 2005 |
Test set |
Competition 1 |
1 August 2005 |
After 14 October 2005 |
Competition 2 |
1 October 2005 |
After 14 October 2005 |
- Full training set
- The full training set consist of 50 assignments
collected from 11 test subjects. Each assignment consists of a
question followed by ten sentences (titles of news articles). One of the sentences
is the correct answer to the question (C) and five of the sentences are
irrelevant to the question (I). Four of the sentences are relevant to the
question (R), but they do not answer it. The full training set is further divided into
- Training set
- Validation set
Contestants are encouraged to improve, evaluate
and benchmark their method with
the validation data set before submitting their final test set result.
The labels of the validation data set will be published on 1 August 2005.
- Test set
- The test set consists of 180 assignments.
The objective of the challenge
is to predict the correct classification labels (I, R, C) in the test set
based on the eye movements
alone. All assignments in the test set
have five irrelevant (I), four relevant (R) and one
correct sentences (C) - just like the training set.
All assignments in
the test sets are unique. Assignments where the test subject gave a wrong
answer have been excluded. The test assignments were collected from a subset
of our 11 test subjects.
The test set data may be statistically slightly different from
the training and validation sets. For instance, in the training data set the
distribution
of the correct answers was balanced
equally across the ten lines. In test data set
the lines containing correct answers were chosen randomly and independently of
the other assignments. Also the average title length may differ and the
distribution of title topics (political, economical, etc) may also be be different.
More detailed information of the data format can
be found in files c1info.txt and
c2info.txt. Please
see the technical report for information of
eye movements,
experimental setup, baseline methods and references:
Jarkko Salojärvi, Kai Puolamäki, Jaana Simola, Lauri Kovanen, Ilpo Kojo,
Samuel Kaski. Inferring Relevance from Eye Movements: Feature
Extraction. Helsinki University of Technology, Publications in Computer
and Information Science, Report A82. 3 March 2005. [PDF]
Competition 1 (preprocessed data)
- Features are in columns, feature vectors in rows.
- Data consists of several assignments. Each assignment is a time sequence
of 22-dimensional feature vectors.
- The first column is the line number, second the assignment number and the next
22 columns (3 to 24) are the different features. Columns 25 to 27 contain
extra information about the example. The training data set
contains the classification label in the 28th column: "0" for irrelevant,
"1" for relevant and "2" for the correct answer.
- Each example (row) represents a single word. You are asked to
return the classification of each read sentence.
- The 22 features provided are commonly used in psychological studies on eye
movement. All of them are not necessarily relevant in this context.
Competition 2 (raw data)
- Features are in columns, gaze points in lines.
- Each dat-file contains a set of assignments. Each assignment is a
time series of eye fixations and the mean diameter of the pupils. For exact
file information refer to c2info.txt. The assignment number
matches the assignment number in the corresponding wrd-file.
- The wrd-files include the locations of words. This
enables matching gaze coordinates to words according to some
criteria. Training and validation data sets contain several examples of the
same assignment, but in the test set all assignment are unique.
- You are asked to classify the sentences in files c2_validate.wrd and
c2_test.wrd.
File c2_train.wrd contains the correct
classification for the training data in the 7th column: "0" for irrelevant
data, "1" for relevant and "2" for the correct answer. You must classify all
senteces (seen and unseen).
- Data sample rate is 50 Hz.
[an error occurred while processing this directive]