
|  |
ROMIP Test Collections
We prepared the following collections for evaluation of participating systems:
-
Narod.ru Web collection
It is a pseudorandom selection of web sites from the domain narod.ru
(narod.ru is a national free hosting provider in Russia).
The collection consists of 728 000 documents.
-
KM.ru Web collection 2007 (NEW)
KM.ru collection is a copy of www.km.ru multiportal. It consists of about 3 000 000 documents.
-
BY.web collection 2007 (NEW)
It is a subset of pages from the .by domain which were present in the index of Yandex on May, 2007.
-
DMOZ Web collection
Collection based on the Russian-language section of the
dmoz.org catalog.
This collection is used as a training set in classification of
Web sites and Web pages tracks.
-
Legal documents collection 2004
Collection of documents from the Russian Federation legislation built in 2004. It consists of 61 000 documens.
-
Legal documents collection 2007 (NEW)
Collection of documents from the Russian Federation legislation built in 2007. It consists of 300 000 documens.
-
News collection
A set of news reports from 25 different sources covering three non-overlapping time intervals. The size of this collection is about 31 500 documents.
|