| |
Fact Extraction Track
Fact extraction from news reports.
Overview
This track is dedicated to the problems related to fact extraction from texts.
In 2006 the concrete tasks were:
- proper nouns extraction
- extraction of named entities of a given type
- extraction of facts of a given type
Task Rules
-
Extract all the named entities:
For each given text participating system must build a list of named
entities.
For each entity the following information must be provided:
- list of references to usages of the entity in the text (offsets and lengths in bytes)
-
(optional) specify the type of the entity: person/organization/place-name/other
-
Extract facts of the following types:
- Who worked/works in this organization?
- Where worked/works the given person?
- Who is the owner of the given organization?
- What companies did/does the given person/organization own?
Note: Company buyers, sellers, and shareholders are also accepted as owners.
Participants must process the whole collection without using the results of the name entitities extraction.
Fact description must include the following information:
- fact type
- reference to the text fragment, containing the fact description
(offset, length (not longer than 500 bytes))
- two standardized names of the objects referenced in the fact
- reference to the entity in the text (offset from the beginning of the text fragment)
Participants are allowed to perform only the first task, the second one is
optional.
Evaluation Methodology
Evaluation is carried out in two stages:
-
Proper nouns check
A random subset of the news reports in the collection is selected.
Then we evaluate how good do participating systems extract the proper nouns found in
this subset of news reports.
Instructions for assessors:
Is the given line a proper noun in the context of the given text fragment? If yes, then is it an organization, a person, or a
place?
Possible answers:not a proper noun, organization, person, place, other proper noun
-
Facts check
A certain number of proper nouns is selected (the selection procedure is
not yet defined, but will be discussed with the participants) and fact extraction for these objects is evaluated.
Instructions for assessors:
Does the given text fragment contain the fact description connected with the
following objects: (A, B)?
If yes, which fact type it is?
Possible answers: not a fact, purchase, selling, ownership,
belonging, other.
Summary
- Document collection: news collection
- Standard metrics:
The metrics are calculated for generalized proper nouns, for each of the proper noun classes, and for the extracted facts.
- Formats:
|