RuSSIR 2018 Program

Special Topic:

Information Retrieval for Good

With a special focus on applications in humanitarian, medical, and health domains.


  • Crisis Informatics

Carlos Castillo, Universitat Pompeu Fabra

Social media is an invaluable source of time-critical information during an emergency or a sudden-onset disaster. However, emergency response and humanitarian relief organizations that would like to use this information struggle with an avalanche of social media messages that exceeds human capacity to process. Computational methods to process these data and infer general parameters of a crisis, as well as determining priorities for intervention, draw from many disciplines, including natural language processing, semantic technologies, data mining, machine learning, network analysis, human-computer interaction, and information visualization.

Big Crisis Data Book: http://bigcrisisdata.org

  • The Biases of Social Data

Carlos Castillo, Universitat Pompeu Fabra

Online social data such as user-generated content, expressed or implicit relationships between people, and behavioral traces are at the core of many popular web applications and platforms. Social data has been used to study a variety of domains including public policy, healthcare, economics and many social good applications. However, many academics and practitioners are also increasingly warning against the naive usage of social data. They highlight that there are biases and inaccuracies occurring at the source of the data, but also introduced during data processing pipeline; there are methodological limitations and pitfalls, as well as ethical boundaries and unexpected consequences that are often overlooked.  Companion tutorial with A. Olteanu, F. Diaz, and E. Kiciman: http://www.aolteanu.com/SocialDataLimitsTutorial/

  • The Information Retrieval Challenge of Lifelogs and Personal Life Archives

Cathal Gurrin, School of Computing, Dublin City University

In this course, we will consider the new forms of personal data being created today and explore how the current generation of IR tools can be deployed and enhanced to realise the potential of the new personal datasets. The course will consider the opportunities and challenges of data access, storage, indexing, retrieval and presentation. The stare of the art approaches developed for collaborative benchmarking competitions will be explored and the course will end by taking a forward-look at emerging research challenges and opportunities in the domain.

  • Evaluation of IR systems and multi-modal retrieval in the medical domain

Henning Müller, University of Geneva

This course will cover two separate parts: (1) medical image retrieval and (2) benchmarking of image analysis and retrieval applications and corresponding infrastructures. In terms of medical image retrieval, application domains will be explained and areas of text and visual retrieval with a focus on combining visual cues and medical text or structured data (multimodal retrieval). In terms of benchmarking, medical data have several challenges, such as data confidentiality, quickly changing nature of the data and increasingly large data sets. With the Evaluation-as-a-Service paradigm several of these challenges have been addressed, and examples of this approach will be presented and explained

  • Conversational AI through Deep Learning

Valentin Malykh, Mikhail Burtsev, Moscow Institute of Physics and Technology

The course is dedicated to the deep natural language processing which is a hot topic of the past few years. It will consist of the overview of current state in the field of artificial intelligence, accompanied with the real usage cases, followed by introduction to major neural network architectures convolutional and recurrent neural networks in application to the problems of natural language processing. Deep reinforcement learning which is currently on the rise in the field of conversational AI will also be covered as short introduction to the field. Course listeners will receive hands-on experience with intelligent chatbot creation.

  • Learning from User Interactions

Rishabh Mehrotra, Spotify Research

While users interact with online services (e.g. search engines, recommender systems, conversational agents), they leave behind fine grained traces of interaction patterns. The ability to understand user behavior, record and interpret user interaction signals, gauge user satisfaction and incorporate user feedback gives online systems a vast treasure trove of insights for improvement and experimentation. More generally, the ability to learn from user interactions promises pathways for solving a number of problems and improving user engagement and satisfaction. Understanding and learning from user interactions involves a number of different aspects — from understanding user intent and tasks, to developing user models and personalization services. A user’s understanding of their need and the overall task develop as they interact with the system. Supporting the various stages of the task involves many aspects of the system, e.g. interface features, presentation of information, retrieving and ranking. Beyond understanding user needs, learning from user interactions involves developing the right metrics and experimentation systems, understanding user interaction processes, their usage context and designing interfaces capable of helping users. The goal of this course is to present a detailed overview of these different research fields:

Phase I: Leveraging User Interactions for Understanding & Extracting User Tasks;

Phase II: Leveraging User Interactions for Learning User Representations;

Phase III: Behavioural Metrics & Experimentation.

  • Health Search

Guido Zuccon, Queensland University of Technology

This course will introduce researchers to the challenges and opportunities in health search, providing insights into current techniques and their results. It will also offer a hands-on overview of tools specific to the health domain made available by the clinical informatics and natural language processing communities. In particular, it will cover the different end user requirements, provide a hands-on introduction to domain-specific tools and present resources and campaigns for evaluation in health search.

  •  Learning to Rank and Evaluation in the Online Setting

Harrie Oosterhuis, University of Amsterdam

In this course we will look at the fields of Online Evaluation and Online Learning to Rank. Methods from these fields are based around user interactions, and have been proven to be reliable and efficient even when very few interactions are available. However, user interactions bring their own difficulties, as user behavior is very dependent on the system’s actions, thus online methods must deal with interaction noise, and biases w.r.t. display positions, document selection, etc. Yet methods in the online setting overcome these issues and provide results that are more in line with the true user preferences than traditional methods. This course will detail the particularities of the online setting and show how online evaluation and online learning to rank methods still provide reliable results in this setting.

  • Retrieving Information Interactively Using Natural Language

Prasenjit Mitra, Pennsylvania State University

The module will discuss developments in natural language processing and how that is being leveraged to (a) pose queries in natural language, (b) send responses generated using natural language generation, and/or (c) engage in dialog with the end-user to refine questions or answers. Specifically, the course will introduce the student to the models of conversation and discourse in natural language, identify desirable properties of such conversations, and introduce issues involved with extracting the semantics of queries and the information needs of the end-users who pose the queries.