ROMIP: Russian Information Retrieval Evaluation Seminar

 General principles 
 Test collections 
 Relevance tables 


Near Duplicates Detection Track


The Near Duplicate Detection Track provides the evaluation for content-based methods for detecting near duplicates in image collections. Our notion of near duplicates involves images of exactly the same scene or object taken in different conditions. These conditions may differ in zooming, focus levels, illumination, foreground occlusions, view points.

This task differs from the common one of transformed image recognition. In the latter synthetic datasets are usually generated for evaluation, in which several images are processed automatically using various image transforms to get a set of duplicate images. We provide a collection of natural near duplicates. Examples of images treated as near duplicates are below.

Examples of similar images, which are not duplicates:

For this track the standard procedure is used.

Test Collection

The collection of near duplicate images is used for this track.

Task Description for Participating Systems

Participants are to find all groups of near-duplicates in the provided data collection. It is possible that one image belongs to several groups of near-duplicates. There is no restrictions to groups’ size. The evaluation will be performed for the top N biggest groups found by participants.

Evaluation Methodology

Evaluation will be performed by independent assessors. False Positive Rate and False Negative Rate will be used as metrics.

  • instructions for assessors:
    Assessors are to mark all near duplicates of the given image among the images which were marked as duplicates by at least one participant.
  • official metrics
    • false positive rate
    • fasle negative rate

Data Formats