ROMIP: Russian Information Retrieval Evaluation Seminar

 General principles 
 Test collections 
 Relevance tables 


Image annotation track


This track provides the evaluation for automatic content-based image annotation methods. The various approaches to image annotation are welcomed. One might use object recognition algorithms or apply scene classification methods (indoor/outdoor, city/landscape, ...) and use object labels or scene labels to tag a source image. No particular training set is provided to the participants for this task being a matter of principle.

For this track the standard procedure is used.

Test Collection

Flickr image test collection is used for this track.

Task Description for Participating Systems

We provide a list of 2000 images randomly selected from the same data collection. Expected result is a list of textual tags per every provided image. Maximum list size is 15 per image.

Evaluation Methodology

Evaluation will be performed by relevance assessors.

A random subset of task images will be selected for evaluation. The tagging ”pool” will be built for every selected task image based on the tags provided by the participants for this image. The pools will be judged by assessors for relevance of tags to the image.

  • Instructions for assessors:
    assessors evaluate if a tag is semantically relevant to an image.
  • Relevance scale:
    • Tag is valid: it describes the semantics of a whole image or a part of an image
    • Tag is not valid for description of an image
  • Official metrics:
    • Tags accuracy (the percentage of valid tags among all tags of a participant)
    • An average number of valid tags per image
    • Tags precision (for the intersection of participants’ tag vocabularies)
    • Tags recall (for the intersection of participants’ tag vocabularies)

Data Formats