ROMIP: Russian Information Retrieval Evaluation Seminar

 General principles 
 Test collections 
 Relevance tables 



Here is detailed information about the sixth cycle of ROMIP: Results and ROMIP'2008 participants reports are available in the Publications section.


April 22, 2008
Official start of ROMIP'2008. The call for participation was published.
May 16, 2008
List of ROMIP'2008 participants was published.
June 2, 2008
Classification and news clustering tracks were officially launched.
June 3, 2008
Ad hoc search in a collection of legal documents was officially launched.
June 9, 2008
Tasks for content-based image retrieval track were sent out.
June 12, 2008
The following tracks were officially launched: It was decided to cancel QA track in 2008 due to insufficient number of applications for participation and limited resources available for evaluation.
June 17, 2008
Ad hoc search in a mixed collection is officially launched.
September 29, 2008
The evaluation results for all tracks are sent to the participants.
October 9, 2008
Proceedings of ROMIP'2008 (in Russian) were published.
October 13, 2008

ROMIP'2008 workshop took place in Dubna on October 9, 2008. It was collocated with RCDL'2008.

Agenda included 14 reports from participants and the round table discussion about ROMIP's future.

Organizing Committee

  • Mikhail Ageev (Moscow State University, Russia)
  • Alexander Antonov (Galaktika-Zoom, Moscow, Russia)
  • Pavel Braslavski (Yandex, Ekaterinburg, Russia)
  • Maxim Gubin (IAC Search & Media, USA)
  • Boris Dobrov (UIS RUSSIA, Moscow, Russia)
  • Mikhail Kostin (Mail.Ru, Moscow, Russia)
  • Igor Kuralenok (St. Petersburg State University, Russia)
  • Igor Nekrestyanov (St. Petersburg State University, Russia)
  • Marina Nekrestyanova (NebuAd, St. Petersburg, Russia)
  • Vladimir Pleshko (RCO, Moscow, Russia)
  • Ilya Segalovich (Yandex, Moscow, Russia)
  • Vlad Shabanov (Vertical Search, Moscow, Russia)
  • Natalia Vassilieva (HP Labs, St. Petersburg, Russia)


  • Branch Image
    Branch Image is a real time research search engine for image retrieval and classification. Branch Image uses a clustering in space of image features having different weights. The weights are obtained as a result of subjective experiments. A number of high level and low level image features is already elaborated but the research continues now.

  • EventSupervisor
    Topic detection and tracking system based on modified version of CMU TDT algorithm that takes into account some semantic and stylistics aspects.

  • Exactus

  • Galaktika-Zoom
    "Galaktika-Zoom" is a text mining solution working with unstructured data. The system includes proprietary tools for textual data repository creation and management, full-text search, automatic structuring, and data analysis tools based on linguistic, mathematical and statistical methods.

  • HeadHunter
    Experimental search engine using as standard as well as custom search algorithms. On the seminar it's planned to test several relevance estimation algorithms based on in-depth analysis of indexing documents.

  • IFM
    IFM - experimental system for near-duplicate image retrieval and detection. The system is based on interest point detection methods such as Difference of Gaussians, Laplasian of Gaussian and other. The main idea is computation of local image features, which are robust to changes due to different transforamtions. Instead of using low-level global features, the image is described by a set of local interest point feature vectors. Thus the image comparison becomes a comparison of local interest point sets. For this task to be solved, scalable indexing and retrieval methods are necessary. The goal of research is comparison and generalization of existing methods.

  • ImSim
    ImSim is a near-duplicates detection system developped by HPLabs. Our approach involves a descriptor extraction step for every image which is followed by a hashing algorithm. Image descriptors are calculated based on texture and color features for regions of interest (ROI) built around key points. SIFT algorithm is used to find key points in the picture. Texture of the ROI is described by smoothed and normalized gray-scale intensity levels. Color histogram is used as a color feature. Matching of image descriptors is performed by using the Locality Sensitive Hashing (LSH) approach. Every descriptor is mapped to a hash: the closer the descriptors are to each other in cosine distance, the higher the probability that their hashes are identical. The descriptors we use are designed to work well with cosine similarity measure.

  • LISA
    The main idea of the method proposed for content-based image retrieval implies transformation of a source image to the special form. The core representation of an input image is realized by means of so-called Matrix of Brightness Variation. To compare similarity of given images a special measure is introduced. In fact, this measure is a weighted pseudometrics which involve signs of partial derivatives of brightness function of color image components. The proposed approach can be used both for content-based image retrieval and near duplicates detection.

  • mnoGoSearch
    Open source search engine using database as repositary.

  • NNCS
    Context-dependent classification and search system rested on representation of the text corpus in the form of an associative semantic network.

  • PhotoFinder

  • RCO
    RCO team is focused on research in area of computer linguistics and development of text analysis solutions for full-text databases, data-warehouses and BI systems. In the workshop we are planning to drive several experiments on text categorization and news clustering tasks.

  • RMaxG
    A library and a set of test utilities developed for experiments in the areas of data compression, optimal indexing, statistical modeling, and machine learning.


  • Search
    Information retrieval system, version mod.2. The system is based on traditional algorithms combined with our own developments.

    Research project aimed for creating and evaluating recurrent thematic Web search system.

  • Subject Search Sleuth (SSS)
    Subject Search Sleuth (SSS) is a text search and annotation engine based on the fast non-reconsidering full-text fuzzy pattern search algorithm developed by Sergey Kryloff. The SSS algorithm supports cases when search terms are absent, swapped or alternated with other terms in the answer. Being based on notion of Q-Term (instead of word, their canonical form or stem) SSS is very flexible in regard to supporting multiple languages. Current version supports 40 languages, including some Asian languages, Arabic, Indonesian and Hebrew.


  • Yandex