3rd Russian Summer School in Information Retrieval
September 11-16, 2009, Petrozavodsk

Organizers

ROMIP: Russian Information Retrieval Evaluation Seminar Petrozavodsk State University Karelian Research Center, RAS Russian Foundation for Basic Research videolectures.net

Golden sponsor

Yandex

Bronze sponsor

Microsoft Research

School Program

The school program includes four main courses and a short course (two lectures). Preliminary schedule looks as follows:

Sep 10, Th Sep 11, Fr Sep 12, Sa Sep 13, Su Sep 14, Mo Sep 15, Tu Sep 16, We
8.30-10.00   registration CoAd CoAd CoAd EDS LS4S
10.00-10.30 break break break break break
10.30-12.00 CoAd IRM MUBI MUBI MUBI YSC
/
ROMIP
12.00-13.00 lunch lunch   lunch lunch
13.00-14.30 IRM IRM excursion IRM LS4S
14.45-16.15 MUBI EDS MUBI IRM
16.15-16.45 registration break break break break
16.45-18.15 EDS Yandex
lecture
  EDS EDS
after 19.00 welcome
party
  sport
evening
RuSSIR
party
departure

Information Retrieval Modeling (IRM)

Djoerd Hiemstra, University of Twente

There is no such thing as a dominating model or theory of information retrieval, unlike the situation in for instance the area of databases where the relational model is the dominating database model. In information retrieval, some models work for some applications, whereas others work for other applications. For instance, vector space models are well-suited for similarity search and relevance feedback in many (also non-textual) situations if a good weighting function is available; the probabilistic retrieval model or naive Bayes model might be a good choice if examples of relevant and nonrelevant documents are available; Google's Pagerank model is often used in situations that need modelling of more of less static relations between documents; region models have been designed to search in structured text; and language models are helpful in situations that require models of language similarity or document priors; In this tutorial, I carefully describe all these models by exlpaining the consequences of modelling assumptions. I address approaches based on statistical language models in great depth. After the course, students are able to choose a model of information retrieval that is adequate in new situations, and to apply the model in practical situations.

Slides: lecture 1, lecture 2, lectures 3, lecture 4, lecture 5, lecture 6.
Video.

Modeling Web Searcher Behavior and Interactions (MUBI)

Eugene Agichtein, Emory University

Hundreds of millions of users search the web daily, clicking on the results, submitting and refining queries and otherwise interacting with the search engines. The vast amount of information generated as a by-product of these interactions can be mined to dramatically improve the effectiveness of web search, and information access in general.
This course will survey the research in modeling user behavior in web search, and how this information can improve web search effectiveness. The emphasis will be on learning and analyzing the appropriate data mining and machine learning techniques for the user behavior and interaction data, and on the integration of the behavioral models into the search engine operation.

Slides: lecture 1, lecture 2, lectures 3, lecture 4, lecture 5.
Video.

Enterprise and Desktop search (EDS)

Pavel Dmitriev, Yahoo! Labs
Pavel Serdyukov, University of Twente
Sergey Chernov, L3S Research Center

The Enterprise and Desktop Search problems recently received a considerable amount of attention from academia, mainly due to the increasing demand in industrial solutions supporting various search tasks in intranets. While challenges arising in intranet search are not entirely new comparing to those that web community has faced for years, advanced web search technologies are often unable to address them properly. In this course we give research prospective on distinctive features of both Enterprise and Desktop Search, typical search scenarios, existing ranking techniques and algorithms. First lecture gives a general introduction, reviews existing systems and outlines typical research challenges. In our next lecture we plan to summarize advanced ranking algorithms and personalization methods utilizing implicit and explicit feedback from users. Third lecture provides an overview of exploratory search methods, including search result clustering/ categorization, faceted search, as well as related techniques stimulating interaction with a user. Later, we discuss latest developments in expert/people search, for example, graph-based and language model based methods. Last lecture covers various aspects of Desktop search: state-of-the-art research prototypes, advanced real-world applications and recent break-through ideas like just-in-time retrieval and task context detection. The course is wrapped up with discussion on open problems and research directions in Enterprise and Desktop search.

Slides: lecture 1, lecture 2, lectures 3-4, lecture 5.
Video.

Computational advertising: business models, technologies and issues (CoAd)

James G. Shanahan, Independent Consultant

Internet advertising revenues in the United States totaled $21 billion for 2007, up 25 percent versus 2006 revenues of $16.9 billion (according to the Interactive Advertising Bureau); this represents approximately half the worldwide revenue from online advertising. Fueled by these growth rates and the desire to provide added incentives and opportunities for both advertisers and publishers, alternative business models to online advertising are been developed. This tutorial will review the main business models of online advertising including: the pay-per-impression model (CPM); and the pay-per-click model (CPC); a relative new comer, the pay-per-action model (CPA), where an action could be a product purchase, a site visit, a customer lead, or an email signup; and dynamic CPM (dCPM) which optimizes a campaign towards the sites and site sections that perform best for the advertiser.
This tutorial will also discuss in detail the technology being leveraged to automatically target ads within these business models; this largely derives from the fields of machine learning (e.g., logistic regression, online learning), statistics (e.g., binomial maximum likelihood), information retrieval (vector space model, BM25), optimization theory (linear and quadratic programming), economics (auction mechanisms, game theory). Challenges such as click fraud (the spam of online advertising), deception, privacy and other open issues will also be discussed. Web 2.0 applications such as social networks, and video/photo-sharing pose new challenges for online advertising. These will also be discussed.

Slides: lecture 1, lecture 2, lectures 3, lecture 4.
Video.

Linguistic Semantics for Search Precision and Recall Improvement (LS4S)

(short course)

Ilya Tikhomirov, Institute for Systems Analysis of RAS

Modern search engines return non-relevant documents too often. The main reason - search engines use algorithms based on various statistical scores of text documents, rather than on “understanding” the meaning of the queries and the contents of the documents. To understand the meaning of a query and documents we should use some semantic model which allows us to semantically match a query with documents. Since we deal with natural language documents we should use an adequate linguistic model and text representation model.
The first lecture gives a general introduction to Linguistic Semantics, its brief history and definitions of natural language processing levels (lemmatizing, morphological, syntactic and semantic analysis). The introduction to Communicative Grammar is given.
The second lecture gives an introduction to Heterogeneous Semantic Networks. We take into consideration text representation model in semantic search tasks: how Communicative Grammar and Heterogeneous Semantic Networks can be used for search precision and recall improvement.

Slides: lecture 1, lecture 2.
Video.

Greedy Function Optimization in Learning to Rank (Yandex lecture)

Andrey Gulin, Pavel Karpovich, Yandex

Greedy function approximation and boosting algorithms are well suited for solving practical machine learning tasks. We will describe well-known boosting algorithms and their modifications used for solving learning to rank problems.

Slides, video.

Contacts

Please send all inquiries to school[at]romip[dot]ru.