ROMIP: Russian Information Retrieval Evaluation Seminar

 News 
 About 
 Manifesto 
 General principles 
 Participation 
 Test collections 
 Relevance tables 
 History 
 2004 
 2005 
 Publications 
 Forum 

По-русскиПо-русски
 

Legal Documents Collection 2004

Description

This collection consists of documents from the Russian Federation legislation and is provided by Kodeks. It contains HTML documents and unlike the Web collections is much more uniform.

Dataset Parameters
  • Size of HTML data: 1.6 Gb
  • Number of pages: 61 000
  • Encoding: cp1251
Rights to Use

The rights to use are granted to ROMIP by Kodeks, which is the owner of the collection. To get access to the collection you must sign the usage agreement.

Data Format

The collection is distributed in xml files of a certain format. These files are split into two groups: legal.* and legal_training.*. Files from the second group contain documents which were used as a training set in the track of legal documents classification.

Tracks in Which the Collection Was Used
  • Ad hoc search in a collection of legal documents
  • Ad hoc search in a mixed collection
  • Similar documents search
  • Classification of legal documents
  • Query-biased summarization