1st Russian Summer School in Information Retrieval
September 5-12, 2007, Ekaterinburg

Organizers and Sponsors

ROMIP Ural state university

General sponsor

Yandex

Silver sponsor

Microsoft

Bronze sponsors

Sun Microsystems Google

Schedule

All school activities will be held in the Department of Mathematics and Mechanics, Ural state university (Tugeneva, 4). Lectures will be held in the Rm. 513, practical trainings — in Rm. 514, coffee breaks will be served in Rm. 507, luches — in the canteen in the ground floor.

 

Sep 5, We Sep 6, Th Sep 7, Fr Sep 8, Sa Sep 9, Su Sep 10, Mo Sep 11, Tu Sep 12, We
9.00-10.30   MLA MLA HAW ANNS ATCM ATCM TRSE
10.30-11.00 break break break break break break break
11.00-12.30 MLA MLA HAW ANNS ATCM ATCM TRSE
12.30-13.30 lunch lunch lunch   lunch lunch lunch
13.30-14.30 MSWS YSC YSC city tour YSC search cup departure
15.00-16.30 registration MIR MIR HAW ANNS TRSE
16.30-17.00 break break break   break break
17.00-18.30 MIR MIR HAW ANNS TRSE
after 19.00 welcome
party
MRT MAIR football
game
RuSSIR
party
 

Machine Learning Algorithms for Web-related problems (MLA)

Mikhail Bilenko (Microsoft Research) and Pavel Dmitriev (Cornell University)

Machine learning algorithms are widely used in web-related tasks, where due to the large scale and varying quality of data, adaptive techniques provide significant advantages over manual approaches. Examples of applications where learning methods have been very successful include learning ranking functions for search engines, detecting spam, clustering news articles, and learning hierarchies in online tagging systems. This course will provide a brief introduction into the general area of machine learning, show how important problems in web search and mining can be solved using machine learning techniques, and discuss problems and tradeoffs involved in applying machine learning approaches to web-scale datasets.

Slides (3 Mb)

Video: part 1 (134 Mb), part 2 (106 Mb), part 3 (110 Mb), part 4 (182 Mb).

Language: En

Music Information Retrieval (MIR)

Andreas Rauber (Vienna University of Technology)

In this course we will take a closer look at the various areas, tasks, and methods that together form the field of music information retrieval (MIR).

We will start by considering the various types of data that are relevant for MIR activities, ranging from both symbolic as well as acoustic music data, via textual, up to image and video data. This will be followed by a brief overview of the overwhelming number of tasks and challenges in MIR to provide a thorough understanding of the problem domain and the interdisciplinary nature of this domain.

The core part of the course will then address a number of selected topics. Specifically, we will focus on various techniques for feature extraction from music, and their utilization for tasks such as retrieval, genre classification, chord detection, and others. We will also analyze and discuss the benefits of combining different modalities, such as textual and acoustic information, as well as the utilization of web information for these tasks. Last, but not least, we will take a closer look at a few applications, such as the PlaySOM and PocketSOM, that assist users in organizing their music collections, creating playlists on desktop computers as well as mobile phones. We will also review current music web portals and discuss future directions in music consumption and distribution.

The course will be acompanied by a range of practical exercises, allowing participants to analyze their own music collections and test the proposed mehods.

Language: En

Hyperlink Analysis on the Web: Approaches, Algorithms, and Applications(HAW)

Alexander Sychov (Voronezh State University)

The course starts with the nature of the information retrieval in the context of the World Wide Web (WWW). Hyperlinks introducing in documents effects both documents representation and retrieval techniques. This effect is considered in the course. WWW formal description as directed graph, models and regularities are discussed. Further, the effect of hyperlink analysis on the relevance calculation and the crawling strategy is demonstrated. The additional topic of the course is the WWW self-organization and dynamics. For preliminary reading one may recommend presentation: http://company.yandex.ru/class/courses/sychev.xml

Slides (1.6 Mb)

Language: Ru

Algorithms for Nearest Neighbor Search (ANNS)

Yury Lifshits (Steklov Institute of Mathematics at St.Petersburg)

Nearest neighbors problem is formulated as follows: Given a set S of points in some space V (equipped with similarity function), construct a data structure which given any query point q from V finds the closest point in S to q. This kind of search problems arise in many areas: recommendation systems, text classification, personalized news aggregation and targeting on-line ads.

Course page (incl. slides and references)

Video: part 1 (214 Mb), part 2 (251 Mb), part 3 (351 Mb), part 4 (237 Mb).

Language: En

Automated Text Classification Methods (ATCM)

Mikhail Ageev (Moscow State University)

This course will provide an introduction to the classical and modern problem statement for the text categorization tasks. We will show different techniques and methods for text categorization, based on machine learning and knowledge-based approach. We will also discuss the main problems of text categorization, which lead to erroneous categorization.

Slides (2.6 Mb)

Video: part 1 (99 Mb), part 2 (196 Mb), part 3 (117 Mb), part 4 (286 Mb).

Language: Ru

Text Retrieval Systems Evaluation (TRSE)

Igor Kuralenok (St.-Petersburg state university, Yandex)

English description of the course in not available.

Slides (2 Mb)

Video: part 1 (411 Mb), part 2 (547 Mb), part 3 (585 Mb).

Language: Ru

Young Scientists' Conference on Information Retrieval (YSC)

See conference site for details.

Microsoft Technologies for Web Search(MSWS)

Marat Bakirov(Microsoft)

English description of the course in not available.

Language: Ru

Map-Reduce Technique and Its Applications for IR (MRT)

Ivan Krasin (Google)

The lecture is dedicated to solving IR problems on large data sets. Map-Reduce technique is widely used by different subsystems of Google Search and makes it easy to implement programs which retrieve information from indexed web pages. The set of examples demonstrates use cases for Map-Reduce. The lecture ends with short talk about Google R&D in Russia.

Slides (4 Mb)

Video (47 Mb).

Language: Ru

Morphological analysis in IR tasks (MAIR)

Ilya Segalovich (Yandex)

1. What is morphology for in search tasks?
2. Mechanics beyond morphological analysis.
3. Dictionary-free morphology: a survey.
4. Applications: spell checker, Web search, etc.

Video (111 Mb).

Language: Ru

Contacts

Please send all inquiries to school[at]romip[dot]ru.