Question Answering Track
This track is dedicated to retrieval of answers to
well-formed natural language questions.
The source dataset is Narod.ru collection.
Documents from all the archives narod.* and narod_training.* must be
Task Description for Participating Systems
Each participant is granted access to Narod.ru collection and a set of queries.
The queries used in the evaluation are selected randomly from the set of
Russian language questions proposed by the participants and the organizers.
The following types of questions are accepted:
- Questions to the attribute or to the subject:
- What is ...? (What is anaphoresis?)
- Who is ...? (Who is Nabokov?)
- Who did ...? (Who invented bicycle?)
- What/which ...? (Which country won the soccer championship?)
- Questions to the direct object:
- What did ... do? (What did Edison invent?)
- Questions to the adverbial modifier:
- How many/much ...? (How many people live in Moscow?)
- What is the size/length/height/area of ...?
- When? What day? What month? What year? How long?
(What year did the house burn down?)
- Where to? To which country/city? To which continent?
(Where was the cargo sent to on May 18th?)
- Where from? From which country/city? (From which country did the cargo come?)
- Where? In which country/city? On which continent?
(In which city is the Eiffel tower located?)
- Why? (Why was the alarm activated?)
- How? (How to remove a stain from carpet?)
- Questions to the indirect object:
- Preposition + <what> (Of what does water consist?)
- What/which + <word with known semantics>?
- What/which + <word with unknown semantics>?
Participants obtain the tasks for a very short period of time (one day).
Expected result is an ordered
list of not more than 10 "answers" for each query. Each answer must be supplied with an URL of
a document where the answer was found and a plain text snippet of the document
not longer than 300 characters containing the answer.
The task collection is built in four stages according to the following schedule:
- May 23th - each participant proposes his definition of "correct" questions with 5-10 examples
- May 27th - final definition; the overall list of questions is formed
- June 10th - participants send 200 questions to the organizers. From each group
of questions 50 are filtered out so that the same number of questions
from each participant is accepted.
June 15th - final query set (total of 500 questions)
- number of questions: 500
- instructions for assessors:
Assessor looks through the snippets with answers and the documents where
they were found and tries to answer the following questions:
Assessor formulates also a "correct" answer ("key criterion").
- Does the snippet contain an answer to the question?
- Having seen only the snippet do you think it is likely that the document
contains an answer to the question?
- Does the document contain an answer to the question?
- evaluation method: pooling (pool depth is 50)
- relevance scale:
- snippet contains an answer/document probably contains an
answer/document contains an answer/no answer/impossible to evaluate
- official metrics: