DMOZ Web Collection
This collection is based on the Russian-language section of the dmoz.org catalog and is used as a training set for Web page classification.
The collection consists of the sites from the DMOZ second-level categories (starting from World -> Russian), which don't contain explicit copyright notices. To keep the collection size reasonable no more than 500 pages from each web site were included in the collection (performing breadth-first traversal of each Web site's structure graph starting from the home page).
Rights to Use
The copyright holders are the authors of the Web pages. Web sites which prohibit copying of their content were not included in the collection.
This collection is distributed by the program committee only to those who wish to perform the task of Web site or Web page classification tracks. If you want to get access to the collection you will have to sign the usage agreement.
Tracks in Which the Collection Was Used