This page describes the details of the data provided for the MEG Mind reading challenge organized in ICANN'11.

Stimuli

We presented three separate sequences comprising video clips in three different categories (artificial: clips showing animated shapes or text; football: clips showing sequences from a football match; and nature: clips from a nature documentary). Each sequence comprised a roughly equal distribution of clips from the three categories. The clips were of duration 6-26 seconds. During each sequence,each clip was interspersed with a 5s 'rest' interval, during which the subjects were shown a crosshair in the center of the visual field. The first sequence was repeated twice, while the other two were shown only once.

In addition to the above categories of short clips, we also showed two long film sequences (about 20 min in duration). The first was a Mr. Bean film and the second was a sequence from a Charlie Chaplin feature film. We thank Kranthi Kumar Nallamothu for collecting and editing the stimuli.

The whole experiment was repeated during two consecutive days, using the same subject and same the stimuli. The stimuli were presented without audio.

Subjects

We recorded magnetoencephalography (MEG) signals from one healthy male subject aged 25, who has given written permission for the data to be distributed anonymously for the community.

Recordings

MEG was acquired with a 306-channel MEG system (Elekta Neuromag Oy, Helsinki, Finland), lowpass-filtered to 330 Hz and digitized at 1000 Hz. During the MEG recording, four small coils, whose locations had been digitized with respect to anatomical landmarks, were briefly energized to determine subject's head position with respect to the MEG sensors. The data is provided for 204 planar gradiometer channels.

Permission

The data is made available for the purpose of participating in the competition. Before the ICANN'2011 conference the data may be used solely for that purpose. The data can, however, be used for other purposes after the challenge results have been presented in June 2011.

UPDATE (July 7th, 2011): The data can now be used for non-commercial research purposes. In case you use the data in a publication, please cite the challenge report accordingly. Full citation details will be added in August. In case you have any questions, send email to icann2011.meg@cis.hut.fi.

Preprocessing

The continuous, raw MEG data were low-pass filtered to 50 Hz, and downsampled to a sampling rate of 200 Hz, and external interference was removed and head movements compensated for using the signal space separation (SSS) method by Taulu and Kajola (2005).

The signal was further band-bass filtered with a filter bank of five filters, with frequency bands centered around 2, 5, 10, 20, and 35Hz. The data files include both the original signal (after the preprocessing operations described above) as well as the five signals resulting from the filter bank.

Note that the filter bank was applied for the whole signals. You may use the unfiltered signal measurements to extract other frequency bands or perform other kinds of preprocessing operations, but you need to operate on the 1s time windows to do that as we do not distribute the full length signal.

Training and test data

The training and test data were created by extracting time periods of 1s length, while discarding 1s of measurements between two consecutive samples. Each time window falls completely within one stimulus category, and the training data is provided with the stimulus labels. The samples are provided in random order.

The task is to learn to predict the test examples extracted from the measurements of the second day, using labeled examples from the first day. The training and test sets are both made roughly (but not exactly) class balanced and crafted so that some samples share the same stimulus while some do not. That is, for some of the test examples there exists a training example that was recorded during presentation of the exact same video sequence, whereas for some the corresponding time instance was not included in the training data.

To enable modeling possible changes in data distribution between the two days, a small random portion of data measured during the second day is provided with class labels. This data can be used freely for learning the classifier.

Overall, the setup results in 677 labeled training examples from the recordings of the first day, 50 labeled examples from the recordings of the second day, and 653 unlabeled examples from the second day. The task is to infer the labels for these 653 examples.

Description of the data files

The data is distributed as three Matlab files. Two of the files contain the actual data, while the third includes the locations of the planar gradiometer sensors. Note that we will not give out the actual stimulus videos or the full length signals.

NOTE: Due to a mistake in preprocessing, new versions of the data files were released on February 8th. If you downloaded the files before that, please download them again. The training part of the data contains the same samples with corrected preprocessing, whereas the test samples are completely new.

megicann_secret.mat: (3625 bytes, July 7th, 2011)

release of the label information for the test samples, to enable further research on the data set after the competition ended
contains four variables: detailedTrain, detailedValid, detailedTest, class_test_day2
the first three variables tell for each training, validation and test sample, respectively, the original measurement batch (number from 1-10, so that the first five correspond to day 1 and the last five to day 2) and the ordinal number of the sample within that batch.
detailedTest also has a third column, telling for each sample wherher the same stimulus clip was shown also in the training phase
class_test_day2 includes the true classes for the test samples

computeError.m: (4292 bytes, July 7th, 2011)

a Matlab script for computing the error measures reported for the competition submissions
the numbers provided by the script can be directly compared with the results in the challenge

megicann_train_v2.mat: (1366828893 bytes, February 8th, 2011)

contains four variables: train_day1, train_day2, class_day1, and class_day2
train_day1 is a cell array of 6 elements. The first element contains the unfiltered signal while the remaining 5 contain the results of the filter bank, in order of increasing frequency band:
- train_day1{1}: unfiltered
- train_day1{2}: 2Hz
- train_day1{3}: 5Hz
- train_day1{4}: 10Hz
- train_day1{5}: 20Hz
- train_day1{6}: 35Hz
each element in the cell array is a 677x204x200 matrix of real values. Here 677 is the number of training samples, 204 is the number of channels, and 200 is the length of the signal within the 1s period (200Hz signal)
Example: data{3}(5,50,100) addresses the signal filtered with 5Hz band-pass filter. The command retrieves the 100th signal value for the 50th channel of the 5th training sample
train_day2 is otherwise identical cell array, but the elements only contain 50 samples
class_day1 and class_day2 are vectors of 677 and 50 elements, respectively, containing the class values of the training examples in the same order as they were given in the data files
The class values are:
1. artificial
2. football
3. nature
4. Bean
5. Chaplin

megicann_test_v2.mat: (1225441083 bytes, February 8th, 2011)

contains the test examples stored in the variable test_day2, which is otherwise identical to the data files in train.mat except that it has 653 samples

megicann_locations.mat: (1606 bytes, December 10th, 2010)

Variable locations contains the physical locations of the MEG sensors as a 102x3 matrix. The device has 102 sensor elements each containing 2 planar gradiometer sensors. The locations are listed in the same order as the channels are stored in the data files, so that the first two channels match the first location, the next two the second location etc.