META-NET Workshop and Challenge in ICANN 2011 and 2012:
Context in Machine Translation

A workshop organized at
ICANN 2011 Conference by
META-NET Network of Excellence

Information on the challenge schedule and data
New: Information on the result workshop at ICANN 2012

BACKGROUND

Machine Translation can be considered to be one of the most challenging tasks computer science has ever taken. Statistical methods have been increasingly successful in providing efficiently MT solutions for many language pairs. However, there is a lot of room for improvement regarding the quality of translations. Prototypical sentences are translated well but in certain situations the end result is far from expected. One central reason for the failures is that current systems take the context into account only in a limited manner.

In natural language processing, the context of use has a considerable impact on the understanding process. It can refer to multiple kinds of meta-data, including information on the document type, domain, genre and medium used. Automatic machine translation systems typically restrict the considered context to one sentence or smaller parts of it.

In order to encourage research in this area, META-NET Network of Excellence (http://www.meta-net.eu) launches a series of challenges. The first challenge, Context in Machine Translation, is organized as an associated event with the ICANN 2011 and ICANN 2012 conferences. ICANN 2011 was organized in June 14th to 17th, 2011 in Finland (http://www.cis.hut.fi/icann11/). ICANN 2012 will be organized in Lausanne, Switzerland.

The challenge was announced during the META-NET workshop in ICANN 2011 on Tuesday, 14th of June.

OBJECTIVES AND TASKS

The challenge aims at advancing machine translation research by providing a concrete application area for supervised or unsupervised algorithms whose objective is to learn to assess the quality of the translations in the given context of use.

The problem will be formulated as machine learning task, which does not require much MT investment of ML people. In particular, the participants do not need to train any machine translation system and the data will provide a set of reasonable context features to reduce the amount of language processing.

The concrete task is to choose the best translation from a set of given likely translations from multiple machine translation systems with the help of additional information of the context in which the translation occurs (domain, surrounding text, etc.). This is typically described as re-ranking and should lead to improved translation performance scores compared to the translations originally selected by the respective MT system.

N-best re-ranking can be seen as a subproblem of structured prediction. Given data from methodologically different MT systems, selecting the best translation poses a multi-task learning problem.

DATA

The data includes parts of the JRC Acquis corpus (http://wt.jrc.it/lt/Acquis/) and additional language models and context features derived from the corpus. The participants are not allowed to use extra data in order to ensure the comparability of the proposed solutions.

The evaluation is split into the following subtasks, depending on which context features were used. For each subtask, the following result sets will be submitted:

a) given the outputs of one system, re-rank them (best translation of each MT system individually)
b) given the sets of outputs of all systems, re-rank them (best translation across all MT systems, which can use methods for combining models)

The submitted algorithm performance will be evaluated by BLEU, the standard automatic machine translation evaluation metric.

More information on the challenge and how to obtain the data is now available

SCHEDULE

14 Jun 2011, workshop and data set release (ICANN 2011, Espoo, Finland)
18 May 2012, test set released
8 Jun 2012, challenge task submissions due
New: Challenge submissions are still accepted
Sep 2012, workshop and challenge results (ICANN 2012, Lausanne, Switzerland)
New: See workshop at ICANN 2012

ORGANIZERS

The Context in Machine Translation Challenge is part of a series of challenges organized by the META-NET Network of Excellence (http://www.meta-net.eu), jointly by Aalto University (Finland), CNRS/LIMSI (France) and ILSP (Greece), supported by other network partners.

WORKSHOP

The objective of the META-NET workshop on Context in Machine Translation is to foster exchange of ideas and results in this area. The notion of context is meant to be understood broadly, including other modalities (like vision) in addition to the textual contexts.

We expect participants, e.g., from machine learning, machine translation, vision research and cognitive systems communities. The workshop is intended for anyone interested in the area as well as for those who plan to participate in the challenge.

Preliminary program of the workshop:

Tuesday, 14th of June, Aalto University School of Science, Espoo, Finland

13.00-13.15 Welcome and participant introductions
13.15-13.30 Timo Honkela: Objectives of the workshop
13.30-14.00 Jaakko Väyrynen: Presentation of the Context in MT challenge and data
14.00-14.45 Katerina Pastra: Bridging language, action and perception: the cognitive context of machine translation
14.45-15.00 Coffee break
15.00-16.00 Expert talks on different aspects of contextuality
- Mats Sjöberg: Visual context for natural language processing, with an example video (automatic labeling of video content)
- Mikko Kurimo: Context in multilingual speech processing - Adaptation of speech models
- Martin Takac: Autonomous construction of semantic representations
16.00-16.50 Discussion

For more information on the conference venue and registration, please see http://www.ics.tkk.fi/icann11/.

General

META-NET Workshop and Challenge in ICANN 2011 and 2012:Context in Machine Translation

A workshop organized at ICANN 2011 Conference by META-NET Network of Excellence