EMF Patterns of Usage on GitHub

Status
Published in the proceedings of ECMFA 2018

Authors
Johannes Härtel, Marcel Heinz, and Ralf Lämmel

Abstract
Mining software repositories is a common activity in software engineering with diverse use cases such as understanding project quality, technology usage, and developer profiles. Such mining activities involve, more often than not, a phase for data extraction from the source code in the repository with recurring tasks such as processing the folder structure (possibly on the timeline), classifying repository artifacts (e.g., in terms of the languages or technologies used), and extracting facts from the artifacts by parsing or otherwise. We describe a new approach for such data extraction; its key pillar is a declarative rule-based language for the uniform, inference-based extraction of facts from the repository (the file system), the artifacts in the repository (their content), and previously extracted facts. All inferred facts are maintained in a triple store. We describe a case study for the purpose of understanding the usage of EMF. To this end, we describe an emerging catalog of patterns of using EMF in repositories and we detect these patterns on GitHub. In our implementation, we use Apache Jena for which we provide dedicated language support tailored towards mining software repositories.

Keywords
Mining Software Repositories. EMF. Rule-based data extraction. Triple store. Pattern detection.

Downloads and links

Bibtex entry
@inproceedings{qegal,
  author    = {Johannes H\"artel and Marcel Heinz and Ralf L{\"a}mmel},
  title     = "EMF Patterns of Usage on {GitHub}",
  booktitle = "{Proc.\ ECMFA 2018}",
  publisher = "Springer",
  series    = "LNCS",
  year      = {2018}
}