Test collections

History

2003

2004

2005

Content-Based Retrieval Track

Overview

The purpose of this track is to evaluate content-based image retrieval (CBIR) from generic color photo collection with heterogeneous content, which can be found in personal photoarchives.

The objective of the Content-Based Retrieval Track is to identify the images in the entire collection which have global or local matches to the query-image by visual and semantic concepts. We consider two images having global matches when they depict the similar scene (for example, two night urban shots). Images have local matches when there are presented similar objects with possibly different backgrounds.

Examples of images treated as similar to each other:

Examples of images treated as probably similar:

Examples of images treated as non-similar:

For this track the standard procedure is used.

Test Collection

The collection contains everyday photos that can be found in private photo collections. Photos are made by non-professional photographers, so they are sometimes of poor quality (for example, too dark or too light). There is no additional information about images is provided (no annotations, keywords or context info).

The test dataset is a subset of Flickr photo collection.It consists of ~20,000 still natural images taken by Flickr users all around the world. It includes indoor and outdoor scenes, landscapes and urban views, portraits and pictures of groups of people, as well as images with now particular subject when it is hard to recognize what is on it. Image size does not exceed 500 pixels in both dimension (typical size is 500x375 pixels).

Most of the photos are taken by ordinary people; the photos are of different quality. A rate of near-duplicates is small.

Task Description for Participating Systems

We provide a list of images randomly selected from the same data collection. These images are considered as queries for content-based query-by-example searchers. Expected result is an ordered list of image names. Maximum list size is 100 per query.

Participants are allowed to submit more than one result per query.

Evaluation Methodology

Evaluation will be performed by relevance assessors. Taking into consideration the high subjectivity of image similarity judgment for this kind of task, several independent assessors will take part in the evaluation process. Pooling approach will be used to evaluate the results (pool depth is 50). Image pools will be created for a randomly selected subset of queries. The pools will be judged for relevance by assessors based on three-level scale: 1) relevant, 2) probably relevant, 3) not relevant. Relevance assessments are entirely based on the visual content of images.

instructions for assessors:
assessors evaluate image similarity to a query-image based on the visual content.
evaluation method: pooling (pool depth is 50)
relevance scale:
- yes / probably yes / no
official metrics
- precision
- recall
- TREC 11-point precision/recall graph
- bpref

Content-Based Retrieval Track

Overview

Test Collection

Task Description for Participating Systems

Evaluation Methodology

Data Formats