Wikipedia represents rich ontological knowledge that is also amenable to automated extraction. In particular, Wikipedia’s classification graph may be used to provide a taxonomy within a field of interest. However, Wikipedia’s classification graph has many issues making prun- ing necessary. In this paper, we assemble a suite of bad smells to identify and remove flawed classification relationships. The smells take into account Wikipedia’s peculiarities, as they are described in guidelines. We organize the smells in a topology to optimize the pruning process. The approach is evaluated for a taxonomy of computer languages--in this field, Wikipedia arguably accounts for the most comprehensive knowledge base that exists.
Keywords
Computer languages. Wikipedia. Taxonomy. Bad smell. Pruning. Topology. Ontology debugging.
@unpublished{HeinzL16,
author = {Marcel Heinz and Ralf L{\"a}mmel},
title = "{Pruning the Wikipedia Classification of Computer Languages}",
year = {2016},
note = "15 pages. Under submission. Online since 18 July 2016"
}