ROMIP: Russian Information Retrieval Evaluation Seminar

 News 
 About 
 Manifesto 
 General principles 
 Participation 
 Test collections 
 Relevance tables 
 History 
 2004 
 2005 
 Publications 
 Forum 

По-русскиПо-русски
 

ROMIP Test Collections

We prepared the following collections for evaluation of participating systems:
  • Narod.ru Web collection
    It is a pseudorandom selection of web sites from the domain narod.ru (narod.ru is a national free hosting provider in Russia). The collection consists of 728 000 documents.

  • KM.ru Web collection 2007
    KM.ru collection is a copy of www.km.ru multiportal. It consists of about 3 000 000 documents.

  • BY.web collection 2007
    It is a subset of pages from the .by domain which were present in the index of Yandex on May, 2007.

  • DMOZ Web collection
    Collection based on the Russian-language section of the dmoz.org catalog. This collection is used as a training set in classification of Web sites and Web pages tracks.

  • Legal documents collection 2004
    Collection of documents from the Russian Federation legislation built in 2004. It consists of 61 000 documens.

  • Legal documents collection 2007
    Collection of documents from the Russian Federation legislation built in 2007. It consists of 300 000 documens.

  • News collection
    A set of news reports from 25 different sources covering three non-overlapping time intervals. The size of this collection is about 31 500 documents.

  • Flickr collection
    This collection is created in 2008. It'is a subset of Flickr photo collection.

  • Movie review collection
    Movie review collection from the recommendation service Imhonet.ru. It contains reviews on movies of various genres.

  • Book review collection
    Book review collection from the recommendation service Imhonet.ru. It contains reviews on books of various genres.

  • Digital camera review collection
    Digital camera review collection from Yandex.Market.

  • The collection of blog posts with sentiment markup
    Each post is about object from one of three domains: movies, books, digital cameras. It has sentiment polarity and target object markup.

  • The collection of quotes from the news flow with sentiment markup
    This collection containes quotes from various news sources. Each quote has its sentiment polarity score.

  • Russian Product Meta-Domain Sentiment Lexicon
    This sentiment lexicon contains sentiment words extracted from text collections in various domains (films, books, computer games, mobile phones, digital cameras).
    If you use this lexicon for your research or a publication, please cite [Chetviorkin I. and Loukachevitch N. Extraction of Russian Sentiment Lexicon for Product Meta-Domain // In Proceedings of COLING 2012: Technical Papers, pages 593–610].
    This lexicon is freely avaliable for non-commercial use.