Web page Classification Track
OverviewThe purpose of this track is to evaluate methods of Web page topic classification.
For this track the standard procedure is used.
Test CollectionThe source dataset consists of BY.web and DMOZ collections. The latter is used as a training set.
The training set consists of web sites, but still different topics can be assigned to pages from the same site.
Task Description for Participating Systems
Just as for the web site classification track each participant is granted access to the training set, DMOZ and BY.web collection. The task is to assign topic(s) from the training set to each document from the collection. Valid number of topics per document is from 0 to 5. The difference to the Web site classification track is that web sites are used only for training.
Expected result is an ordered list of web pages for each category (sorted in descending order of confidence).