Test collections

MT Track 2012

History

2003

2004

2005

Machine Translation Evaluation Workshop & Shared Task

Aim & Scope
Data
Data
Evaluation
Important Dates
Organizers
Contacts

AIM & SCOPE

The aims of the workshop are (1) to develop a common testbed and (2) to evaluate different machine translation approaches for English-Russian pair. Translation will be evaluated on an unseen test set using different machine translation evaluation methods.

The workshop is organized by the Russian Information Retrieval Evaluation Seminar (ROMIP) in cooperation with TAUS Labs.

The workshop is open to all kinds of MT systems and technologies. Experienced and early-stage researchers, as well as industrial developers, are welcome to participate in the evaluation campaign and the workshop.

The workshop will take place following the Dialog conference on computational linguistics and intelligent technologies. The conference is traditionally held in the end of May/beginning of June in the Moscow suburbs.

DATA

A test dataset of about 150,000 sentences originally written in English will be made available to the participants. The participants will be requested to submit the whole dataset translated into Russian. The dataset consists of about 10,800 news articles concatenated into a single file. The sentences in the file are continuously numbered, each sentence in a separate line; a blank line separates documents. A tab delimits sentence number and sentence itself. Since dataset is prepared in a fully automated manner, some flaws are presented in data (e.g. incorrect sentence boundaries, noisy document markup, etc.). Please refer to an example.

The format of the results is the same: sentence number, tab, translation. All senetence numbers must be preserved in the result file. In case your system rejected the setence for some reason please preserve the line beginning with the sentence number. Participants submit their resuts by sending a link to a zipped file to organizers.

The participants are free to use their own systems and any data to complete the task. For example, participants can use the following freely available resources:
-- 1M sentences English-Russian parallel corpus released by Yandex;
-- 119K English-Russian parallel corpus from the TAUS Data Repository. The corpus is free for the ROMIP participants. Please compile and submit the End-User Agreement to TAUS.
Organizers leave to participants' discretion to use or not to use these data.

EVALUATION

The evaluation will be completed on about 1,000 sentences from the test dataset. Human translators will translate these sentences to ensure the gold standard quality. We will employ two types of evaluation measures:
-- automated metrics widely adopted by statistical machine translation community such as BLEU and TERp;
-- blind pairwise evaluation of systems' output performed by human assessors.

We expect the participants to share organizational costs either by taking part in human assessment or in monetary form.

Machine Translation Evaluation Workshop & Shared Task

AIM & SCOPE

DATA

EVALUATION

IMPORTANT DATES

ORGANIZERS

CONTACTS