ATSearch-2010 information search system based on decisions received by
The experimental search system using parsing algorithms for removing
ambiguity in requests and document collection.
Experimental information extraction system.
"Galaktika-Zoom" is a text mining solution designed for processing of
large-scale unstructured data collections.
The system includes tools for textual data repository creation
and management, full-text search, automatic structuring and data
analysis based on mathematical, statistical, and linguistic
IFM3 content-based image retrieval system implements a text approach to
image analysis. Images are modeled with tf/idf-like vectors built upon a
feature dictionary. To prepare the dictionary a lagre set of SURF interest
point descriptors is hashed with LSH and clustered in terms of learned
The algorithms which are used in image retrieval.
Classifier classifies documents using keywords for themes.
Classifier was presented on the RCDL workshop (article).
Classifier is used for classification of hosts in Yandex search engine.
Clustering system using good-known machine learning problems for building
PhotoFinder is a research project in the area of content-based image
retrieval. Several color and texture feature extraction algorithms are
implemented in the scope of the project. The main problem under research is
the fusion of various independent retrieval methods.
In the scope of ROMIP 2010 we would like to evaluate our adaptive fusion
method for low-level color-based features (color histograms and color
PIRS is a training system that is developed under educational research work.
Context-based image retrieval methods are based on extracting and combining
image features of different levels. While developing search methods
precision/recall metrics were prefered against computational effectiveness.
Parallel Text Classiffication System is based on Support Vector Machines and
works on multicore/multiprocessor computers and clusters.
RCO team is focused on research in area of computer linguistics and
development of text analysis solutions for full-text databases,
data-warehouses and BI systems. In the workshop we are planning to drive
several experiments on news clustering tasks.
The research project, which examines several problems of information
- development and evaluation methodologies for context-dependent
- development and evaluation of algorithms for the thematic classification
of web sites and web pages.
KM.RU search engine is based on traditional IR algorithmes and our own
SCAT - Text classification and analysis system. The system is based on
integrated using of machine learning and rule based methods of text
The goal of participation is experimental investigation of new approaches
and methods to text classification.
Sophia is cluster-based search engine. It is based on contextual document
clustering algorithm that can be applied to large document collection.
Previously the system was tested on a set of collections including newspaper
publications, patent abstracts and Medline (all in English). Now our goal is
to test the system and clustering algorithm on collection of document in
UIS RUSSIA is in the Internet since 2000 serving as a collective thematic
resource in social and political sciences with free access for the Russian
universities and academia, public libraries, NGO as well as for individual
users - professors, investigators, students and general public.
The project has been being developed since 1993 by consorted efforts of the
Moscow State University Research Computing Center and Autonomous
Non-commercial Organization Center for Information Research and supported by
the Russian and foreign foundations' grants. The team consists of 25
specialists and accommodated at the Moscow State University Research
Main element of the Automatic Linguistic Text Processing (ALTP) technology
is the Thesaurus on Social and Political Domain, in current version it
covers 110,000 descriptors and terms/synonyms. Thesaurus-based
terminological analysis provides for conceptual indexing, classification
and annotation of electronic text corpora. Technology is implemented to
process all types of data and documents collections for social research.
Question Answering system Umba is general purpose open-domain meta-search
system. System derive short factoid answers from given text collection for
questions posed on natural language (russian).
System follows generic architecture used by participants of TREC and CLEF QA
tracks. QA Task is devided into subtask each one solved using some naive
algorithm. One of tasks - Answer proving - is also solved by Author's
method. The method employs analysis of semantic relationships between
The goal of participation is to measure performance gain caused by new
Author's method versus original naive implementation. Multiple runs of
Method with different parameters to be done.
Author willing to use evaluation results in experimental section of
Сandidate's of Science thesis.