Test collections

Relevance tables

History

2003

2004

2005

2010

ROMIP'2010

Here is detailed information about the sixth cycle of ROMIP:

Chronicle
Organizing Committee
Participants (detailed table)
Tracks:
- Ad hoc search
- Classification
- News clustering
- Query-biased summarization
- Similar documents search (by a sample document or a text fragment)
- Image tracks
  - Content-based image retrieval
  - Near duplicates detection

Results and ROMIP'2010 participants reports are available in the Publications section.

Chronicle (short)

October 15 2010: ROMIP'2010 took place in Kazan and was collocated with RCDL'2010. More then 60 researchers from academia and industry took part in the conference. There were 10 oral presentations followed by the round table.
ROMIP'2010 proceedings can be found in the full list of publications (in Russian).
May 24, 2010: Join to the ROMIP facebook group!
May 12, 2010: Preliminary list of participants are published.
April 19, 2010: Official start of the 8th ROMIP cycle! Deadline for applications - May 15, 2010. Please fill form for the registration to participaite in ROMIP'2010.
Official CFP for ROMIP'2010 in English.

Organizing Committee

Mikhail Ageev (Moscow State University, Russia)
Alexander Antonov (Galaktika-Zoom, Moscow, Russia)
Pavel Braslavski (Yandex, Ekaterinburg, Russia)
Maxim Gubin (Facebook, USA)
Boris Dobrov (UIS RUSSIA, Moscow, Russia)
Mikhail Kostin (Mail.Ru, Moscow, Russia)
Igor Kuralenok (St. Petersburg State University, Russia)
Igor Nekrestyanov (Oracle Corporation, Russia)
Marina Nekrestyanova (RedAril, St. Petersburg, Russia)
Vladimir Pleshko (RCO, Moscow, Russia)
Ilya Segalovich (Yandex, Moscow, Russia)
Vlad Shabanov (Vertical Search, Moscow, Russia)
Natalia Vassilieva (HP Labs, St. Petersburg, Russia)

Participants

ATSearch-2010
ATSearch-2010 information search system based on decisions received by AT.Poisk project
Dislexer
The experimental search system using parsing algorithms for removing ambiguity in requests and document collection.
Exactus
Extractor
Experimental information extraction system.
Galaktika-Zoom
"Galaktika-Zoom" is a text mining solution designed for processing of large-scale unstructured data collections.
The system includes tools for textual data repository creation and management, full-text search, automatic structuring and data analysis based on mathematical, statistical, and linguistic methods.
IFM3
IFM3 content-based image retrieval system implements a text approach to image analysis. Images are modeled with tf/idf-like vectors built upon a feature dictionary. To prepare the dictionary a lagre set of SURF interest point descriptors is hashed with LSH and clustered in terms of learned metric.
Images.Yandex
The algorithms which are used in image retrieval.
KC classifier
Classifier classifies documents using keywords for themes. Classifier was presented on the RCDL workshop (article). Classifier is used for classification of hosts in Yandex search engine.
MPP
Clustering system using good-known machine learning problems for building metrics.
PhotoFinder
PhotoFinder is a research project in the area of content-based image retrieval. Several color and texture feature extraction algorithms are implemented in the scope of the project. The main problem under research is the fusion of various independent retrieval methods. In the scope of ROMIP 2010 we would like to evaluate our adaptive fusion method for low-level color-based features (color histograms and color moments).
PIRS
PIRS is a training system that is developed under educational research work. Context-based image retrieval methods are based on extracting and combining image features of different levels. While developing search methods precision/recall metrics were prefered against computational effectiveness.
PTCS
Parallel Text Classiffication System is based on Support Vector Machines and works on multicore/multiprocessor computers and clusters.
RCO
RCO team is focused on research in area of computer linguistics and development of text analysis solutions for full-text databases, data-warehouses and BI systems. In the workshop we are planning to drive several experiments on news clustering tasks.
ROOKEE
The research project, which examines several problems of information retrieval:
1. development and evaluation methodologies for context-dependent annotation
2. development and evaluation of algorithms for the thematic classification of web sites and web pages.
Search KM.ru
KM.RU search engine is based on traditional IR algorithmes and our own R&D.
SKAT
SCAT - Text classification and analysis system. The system is based on integrated using of machine learning and rule based methods of text classification. The goal of participation is experimental investigation of new approaches and methods to text classification.
Sophia
Sophia is cluster-based search engine. It is based on contextual document clustering algorithm that can be applied to large document collection. Previously the system was tested on a set of collections including newspaper publications, patent abstracts and Medline (all in English). Now our goal is to test the system and clustering algorithm on collection of document in Russian.
SSS
UIS RUSSIA

UIS RUSSIA is in the Internet since 2000 serving as a collective thematic resource in social and political sciences with free access for the Russian universities and academia, public libraries, NGO as well as for individual users - professors, investigators, students and general public.

The project has been being developed since 1993 by consorted efforts of the Moscow State University Research Computing Center and Autonomous Non-commercial Organization Center for Information Research and supported by the Russian and foreign foundations' grants. The team consists of 25 specialists and accommodated at the Moscow State University Research Computing Center.

Main element of the Automatic Linguistic Text Processing (ALTP) technology is the Thesaurus on Social and Political Domain, in current version it covers 110,000 descriptors and terms/synonyms. Thesaurus-based terminological analysis provides for conceptual indexing, classification and annotation of electronic text corpora. Technology is implemented to process all types of data and documents collections for social research.
Umba

Question Answering system Umba is general purpose open-domain meta-search system. System derive short factoid answers from given text collection for questions posed on natural language (russian).

System follows generic architecture used by participants of TREC and CLEF QA tracks. QA Task is devided into subtask each one solved using some naive algorithm. One of tasks - Answer proving - is also solved by Author's method. The method employs analysis of semantic relationships between words.

The goal of participation is to measure performance gain caused by new Author's method versus original naive implementation. Multiple runs of Method with different parameters to be done.

Author willing to use evaluation results in experimental section of Сandidate's of Science thesis.
Yandex.Server