The school program includes four main courses and a short course (two lectures). Preliminary schedule looks as follows:
|Sep 10, Th||Sep 11, Fr||Sep 12, Sa||Sep 13, Su||Sep 14, Mo||Sep 15, Tu||Sep 16, We|
Djoerd Hiemstra, University of Twente
There is no such thing as a dominating model or theory of information retrieval, unlike the situation in for instance the area of databases where the relational model is the dominating database model. In information retrieval, some models work for some applications, whereas others work for other applications. For instance, vector space models are well-suited for similarity search and relevance feedback in many (also non-textual) situations if a good weighting function is available; the probabilistic retrieval model or naive Bayes model might be a good choice if examples of relevant and nonrelevant documents are available; Google's Pagerank model is often used in situations that need modelling of more of less static relations between documents; region models have been designed to search in structured text; and language models are helpful in situations that require models of language similarity or document priors; In this tutorial, I carefully describe all these models by exlpaining the consequences of modelling assumptions. I address approaches based on statistical language models in great depth. After the course, students are able to choose a model of information retrieval that is adequate in new situations, and to apply the model in practical situations.
Eugene Agichtein, Emory University
Hundreds of millions of users search the web daily, clicking on the results, submitting and refining
queries and otherwise interacting with the search engines. The vast amount of information generated
as a by-product of these interactions can be mined to dramatically improve the effectiveness of web
search, and information access in general.
This course will survey the research in modeling user behavior in web search, and how this information can improve web search effectiveness. The emphasis will be on learning and analyzing the appropriate data mining and machine learning techniques for the user behavior and interaction data, and on the integration of the behavioral models into the search engine operation.
The Enterprise and Desktop Search problems recently received a considerable amount of attention from academia, mainly due to the increasing demand in industrial solutions supporting various search tasks in intranets. While challenges arising in intranet search are not entirely new comparing to those that web community has faced for years, advanced web search technologies are often unable to address them properly. In this course we give research prospective on distinctive features of both Enterprise and Desktop Search, typical search scenarios, existing ranking techniques and algorithms. First lecture gives a general introduction, reviews existing systems and outlines typical research challenges. In our next lecture we plan to summarize advanced ranking algorithms and personalization methods utilizing implicit and explicit feedback from users. Third lecture provides an overview of exploratory search methods, including search result clustering/ categorization, faceted search, as well as related techniques stimulating interaction with a user. Later, we discuss latest developments in expert/people search, for example, graph-based and language model based methods. Last lecture covers various aspects of Desktop search: state-of-the-art research prototypes, advanced real-world applications and recent break-through ideas like just-in-time retrieval and task context detection. The course is wrapped up with discussion on open problems and research directions in Enterprise and Desktop search.
James G. Shanahan, Independent Consultant
Internet advertising revenues in the United States totaled $21 billion for 2007, up 25 percent
versus 2006 revenues of $16.9 billion (according to the Interactive Advertising Bureau); this represents
approximately half the worldwide revenue from online advertising. Fueled by these growth rates and the
desire to provide added incentives and opportunities for both advertisers and publishers, alternative
business models to online advertising are been developed. This tutorial will review the main business
models of online advertising including: the pay-per-impression model (CPM); and the pay-per-click
model (CPC); a relative new comer, the pay-per-action model (CPA), where an action could be a product
purchase, a site visit, a customer lead, or an email signup; and dynamic CPM (dCPM) which optimizes a
campaign towards the sites and site sections that perform best for the advertiser.
This tutorial will also discuss in detail the technology being leveraged to automatically target ads within these business models; this largely derives from the fields of machine learning (e.g., logistic regression, online learning), statistics (e.g., binomial maximum likelihood), information retrieval (vector space model, BM25), optimization theory (linear and quadratic programming), economics (auction mechanisms, game theory). Challenges such as click fraud (the spam of online advertising), deception, privacy and other open issues will also be discussed. Web 2.0 applications such as social networks, and video/photo-sharing pose new challenges for online advertising. These will also be discussed.
Ilya Tikhomirov, Institute for Systems Analysis of RAS
Modern search engines return non-relevant documents too often. The
main reason - search engines use algorithms based on various
statistical scores of text documents, rather than on “understanding”
the meaning of the queries and the contents of the documents. To
understand the meaning of a query and documents we should use some
semantic model which allows us to semantically match a query with
documents. Since we deal with natural language documents we should use
an adequate linguistic model and text representation model.
The first lecture gives a general introduction to Linguistic Semantics, its brief history and definitions of natural language processing levels (lemmatizing, morphological, syntactic and semantic analysis). The introduction to Communicative Grammar is given.
The second lecture gives an introduction to Heterogeneous Semantic Networks. We take into consideration text representation model in semantic search tasks: how Communicative Grammar and Heterogeneous Semantic Networks can be used for search precision and recall improvement.
Andrey Gulin, Pavel Karpovich, Yandex
Greedy function approximation and boosting algorithms are well suited for solving practical machine learning tasks. We will describe well-known boosting algorithms and their modifications used for solving learning to rank problems.
Please send all inquiries to school[at]romip[dot]ru.