Here is detailed information about the sixth cycle of ROMIP:
Results and ROMIP'2008 participants reports are available in the Publications section.
- April 22, 2008
Official start of ROMIP'2008. The call for participation
- May 16, 2008
List of ROMIP'2008 participants was
- June 2, 2008
Classification and news clustering tracks were officially launched.
- June 3, 2008
Ad hoc search in a collection of legal
documents was officially launched.
- June 9, 2008
Tasks for content-based image retrieval
track were sent out.
- June 12, 2008
The following tracks were officially launched:
It was decided to cancel QA track in 2008 due to insufficient number of
applications for participation and limited resources available for
- June 17, 2008
Ad hoc search in a mixed collection is officially launched.
- September 29, 2008
The evaluation results for all tracks are sent to the participants.
- October 9, 2008
Proceedings of ROMIP'2008 (in Russian) were published.
- October 13, 2008
ROMIP'2008 workshop took place in Dubna on October 9, 2008. It was collocated with RCDL'2008.
Agenda included 14 reports from participants and the round table discussion about ROMIP's future.
- Mikhail Ageev (Moscow State University, Russia)
- Alexander Antonov (Galaktika-Zoom, Moscow, Russia)
- Pavel Braslavski (Yandex, Ekaterinburg, Russia)
- Maxim Gubin (IAC Search & Media, USA)
- Boris Dobrov (UIS RUSSIA, Moscow, Russia)
- Mikhail Kostin (Mail.Ru, Moscow, Russia)
- Igor Kuralenok (St. Petersburg State University, Russia)
- Igor Nekrestyanov (St. Petersburg State University, Russia)
- Marina Nekrestyanova (NebuAd, St. Petersburg, Russia)
- Vladimir Pleshko (RCO, Moscow, Russia)
- Ilya Segalovich (Yandex, Moscow, Russia)
- Vlad Shabanov (Vertical Search, Moscow, Russia)
- Natalia Vassilieva (HP Labs, St. Petersburg, Russia)
Branch Image is a real time research search engine for image retrieval
and classification. Branch Image uses a clustering in space of image
features having different weights. The weights are obtained as a result of
subjective experiments. A number of high level and low level image features
is already elaborated but the research continues now.
Topic detection and tracking system based on modified version of CMU TDT
algorithm that takes into account some semantic and stylistics aspects.
"Galaktika-Zoom" is a text mining solution working with unstructured data.
The system includes
proprietary tools for textual data repository creation and management,
automatic structuring, and data analysis tools based on linguistic,
mathematical and statistical
Experimental search engine using as standard as well as custom search
algorithms. On the seminar
it's planned to test several relevance estimation algorithms based on
in-depth analysis of
IFM - experimental system for near-duplicate image retrieval and detection.
The system is based on interest point detection methods such as Difference
Laplasian of Gaussian and other.
The main idea is computation of local image features, which are robust to
changes due to
Instead of using low-level global features, the image is described by a set
interest point feature vectors.
Thus the image comparison becomes a comparison of local interest point sets.
For this task to be solved, scalable indexing and retrieval methods are
The goal of research is comparison and generalization of existing methods.
ImSim is a near-duplicates detection system developped by HPLabs.
Our approach involves a descriptor extraction step for every image which is
followed by a hashing
algorithm. Image descriptors are calculated based on texture and color
features for regions of
interest (ROI) built around key points. SIFT algorithm is used to find key
points in the picture.
Texture of the ROI is described by smoothed and normalized gray-scale
intensity levels. Color
histogram is used as a color feature.
Matching of image descriptors is performed by using the Locality Sensitive
Hashing (LSH) approach.
Every descriptor is mapped to a hash: the closer the descriptors are to each
other in cosine
distance, the higher the probability that their hashes are identical. The
descriptors we use are
designed to work well with cosine similarity measure.
The main idea of the method proposed for
content-based image retrieval implies transformation of a source image to
the special form. The core representation of an input image is realized by
means of so-called Matrix of Brightness Variation. To compare similarity of
given images a special measure is introduced. In fact, this measure is a
weighted pseudometrics which involve signs of partial derivatives of
brightness function of color image components. The proposed approach can be
used both for content-based image retrieval and near duplicates detection.
Open source search engine using database as repositary.
Context-dependent classification and search system rested on representation
of the text corpus in the form of an associative semantic network.
RCO team is focused on research in area of computer linguistics and
development of text analysis solutions for full-text databases,
data-warehouses and BI systems. In the workshop we are planning to drive
several experiments on text categorization and news clustering tasks.
A library and a set of test utilities developed
for experiments in the areas of data compression, optimal indexing,
statistical modeling, and machine learning.
Information retrieval system, version mod.2. The system is based on
traditional algorithms combined with our own developments.
Research project aimed for creating and evaluating recurrent
thematic Web search system.
Subject Search Sleuth (SSS)
Subject Search Sleuth (SSS) is a text search and annotation engine based on
the fast non-reconsidering full-text fuzzy pattern search algorithm
developed by Sergey Kryloff. The SSS algorithm supports cases when search
terms are absent, swapped or alternated with other terms in the answer.
Being based on notion of Q-Term (instead of word, their canonical form or
stem) SSS is very flexible in regard to supporting multiple languages.
Current version supports 40 languages, including some Asian languages,
Arabic, Indonesian and Hebrew.