Pruning the Wikipedia Classification of Computer Languages

Status
Submitted for publication

Authors
Marcel Heinz and Ralf Lämmel

Abstract
Wikipedia represents rich ontological knowledge that is also amenable to automated extraction. In particular, Wikipedia’s classification graph may be used to provide a taxonomy within a field of interest. However, Wikipedia’s classification graph has many issues making prun- ing necessary. In this paper, we assemble a suite of bad smells to identify and remove flawed classification relationships. The smells take into account Wikipedia’s peculiarities, as they are described in guidelines. We organize the smells in a topology to optimize the pruning process. The approach is evaluated for a taxonomy of computer languages--in this field, Wikipedia arguably accounts for the most comprehensive knowledge base that exists.

Keywords
Computer languages. Wikipedia. Taxonomy. Bad smell. Pruning. Topology. Ontology debugging.

Downloads and links

Bibtex entry
@unpublished{HeinzL16,
  author    = {Marcel Heinz and Ralf L{\"a}mmel},
  title     = "{Pruning the Wikipedia Classification of Computer Languages}",
  year      = {2016},
  note      = "15 pages. Under submission. Online since 18 July 2016"
}