This page describes the details of the data provided for the MEG Mind reading challenge organized in ICANN'11.
We presented three separate sequences comprising video clips in three different categories (artificial: clips showing animated shapes or text; football: clips showing sequences from a football match; and nature: clips from a nature documentary). Each sequence comprised a roughly equal distribution of clips from the three categories. The clips were of duration 6-26 seconds. During each sequence,each clip was interspersed with a 5s 'rest' interval, during which the subjects were shown a crosshair in the center of the visual field. The first sequence was repeated twice, while the other two were shown only once.
In addition to the above categories of short clips, we also showed two long film sequences (about 20 min in duration). The first was a Mr. Bean film and the second was a sequence from a Charlie Chaplin feature film. We thank Kranthi Kumar Nallamothu for collecting and editing the stimuli.
The whole experiment was repeated during two consecutive days, using the same subject and same the stimuli. The stimuli were presented without audio.
We recorded magnetoencephalography (MEG) signals from one healthy male subject aged 25, who has given written permission for the data to be distributed anonymously for the community.
MEG was acquired with a 306-channel MEG system (Elekta Neuromag Oy, Helsinki, Finland), lowpass-filtered to 330 Hz and digitized at 1000 Hz. During the MEG recording, four small coils, whose locations had been digitized with respect to anatomical landmarks, were briefly energized to determine subject's head position with respect to the MEG sensors. The data is provided for 204 planar gradiometer channels.
The data is made available for the purpose of participating in the competition. Before the ICANN'2011 conference the data may be used solely for that purpose. The data can, however, be used for other purposes after the challenge results have been presented in June 2011.
UPDATE (July 7th, 2011): The data can now be used for non-commercial research purposes. In case you use the data in a publication, please cite the challenge report accordingly. Full citation details will be added in August. In case you have any questions, send email to email@example.com.
The continuous, raw MEG data were low-pass filtered to 50 Hz, and downsampled to a sampling rate of 200 Hz, and external interference was removed and head movements compensated for using the signal space separation (SSS) method by Taulu and Kajola (2005).
The signal was further band-bass filtered with a filter bank of five filters, with frequency bands centered around 2, 5, 10, 20, and 35Hz. The data files include both the original signal (after the preprocessing operations described above) as well as the five signals resulting from the filter bank.
Note that the filter bank was applied for the whole signals. You may use the unfiltered signal measurements to extract other frequency bands or perform other kinds of preprocessing operations, but you need to operate on the 1s time windows to do that as we do not distribute the full length signal.
The training and test data were created by extracting time periods of 1s length, while discarding 1s of measurements between two consecutive samples. Each time window falls completely within one stimulus category, and the training data is provided with the stimulus labels. The samples are provided in random order.
The task is to learn to predict the test examples extracted from the measurements of the second day, using labeled examples from the first day. The training and test sets are both made roughly (but not exactly) class balanced and crafted so that some samples share the same stimulus while some do not. That is, for some of the test examples there exists a training example that was recorded during presentation of the exact same video sequence, whereas for some the corresponding time instance was not included in the training data.
To enable modeling possible changes in data distribution between the two days, a small random portion of data measured during the second day is provided with class labels. This data can be used freely for learning the classifier.
Overall, the setup results in 677 labeled training examples from the recordings of the first day, 50 labeled examples from the recordings of the second day, and 653 unlabeled examples from the second day. The task is to infer the labels for these 653 examples.
The data is distributed as three Matlab files. Two of the files contain the actual data, while the third includes the locations of the planar gradiometer sensors. Note that we will not give out the actual stimulus videos or the full length signals.
NOTE: Due to a mistake in preprocessing, new versions of the data files were released on February 8th. If you downloaded the files before that, please download them again. The training part of the data contains the same samples with corrected preprocessing, whereas the test samples are completely new.