Legal Documents Collection 2007
Description
This collection is created and provided by Kodeks in 2007.
It consists of documents from the
legislation of Russian Federation, Moscow and St.Petersburg by the state on
the second week of December, 2006. The collection contains HTML
documents and unlike the Web collections is much more uniform.
Features:
- Title of document is inserted into the title field of document content
- Formating of documents is made by styles, which are not included
- Tags Hx are not used in the text of documents.
(If you want to detect headers
you need to analyze tags P
for which value of class attribute is "headertext".)
-
Unique feature of this collection is availability of multiple editions of
the same document. Multiple editions are stored as multiple content tags. Date attribute of
content tag defines when this edition was added. Initial (first)
revision has no date attribute.
Dataset Parameters
- Size of HTML data (bz2 archives): 1.6 Gb
- Number of pages: 300 000
- Encoding: cp1251
Rights to Use
The rights to use are granted to ROMIP by Kodeks, which is the owner of the
collection. To get access to the collection you must sign the usage agreement.
Data Format
The collection is distributed in xml files of a certain format.
Tracks in Which the Collection Was Used
- Ad hoc search in a collection of legal documents
- Ad hoc search in a mixed collection
- Classification of legal documents
|