ROMIP Test Collections
We prepared the following collections for evaluation of participating systems:
Narod.ru Web collection
It is a pseudorandom selection of web sites from the domain narod.ru
(narod.ru is a national free hosting provider in Russia).
The collection consists of 728 000 documents.
KM.ru Web collection 2007
KM.ru collection is a copy of www.km.ru multiportal. It consists of about 3 000 000 documents.
BY.web collection 2007
It is a subset of pages from the .by domain which were present in the index of Yandex on May, 2007.
DMOZ Web collection
Collection based on the Russian-language section of the
This collection is used as a training set in classification of
Web sites and Web pages tracks.
Legal documents collection 2004
Collection of documents from the Russian Federation legislation built in 2004. It consists of 61 000 documens.
Legal documents collection 2007
Collection of documents from the Russian Federation legislation built in 2007. It consists of 300 000 documens.
A set of news reports from 25 different sources covering three non-overlapping time intervals. The size of this collection is about 31 500 documents.
This collection is created in 2008. It'is a subset of Flickr photo
Movie review collection
Movie review collection from the recommendation service Imhonet.ru. It contains reviews on movies of various genres.
Book review collection
Book review collection from the recommendation service Imhonet.ru. It contains reviews on books of various genres.
Digital camera review collection
Digital camera review collection from Yandex.Market.
The collection of blog posts with sentiment markup
Each post is about object from one of three domains: movies, books, digital cameras. It has sentiment polarity and target object markup.
The collection of quotes from the news flow with sentiment markup
This collection containes quotes from various news sources. Each quote has its sentiment polarity score.
Russian Product Meta-Domain Sentiment Lexicon
This sentiment lexicon contains sentiment words extracted from text collections in various
domains (films, books, computer games, mobile phones, digital cameras).
If you use this lexicon for your research or a publication, please cite [Chetviorkin I. and Loukachevitch N.
Extraction of Russian Sentiment Lexicon for Product Meta-Domain // In Proceedings of COLING 2012: Technical Papers, pages 593–610].
This lexicon is freely avaliable for non-commercial use.