Large-scale, AST-based API-usage analysis of open-source Java projects

Published in Proceedings of SAC 2011

Ralf Lämmel and Ekaterina Pek and Jürgen Starek

Research on API migration and language conversion can be informed by empirical data about API usage. For instance, such data may help with designing and defending mapping rules for API migration in terms of relevance and applicability. We describe an approach to large-scale API-usage analysis of open-source Java projects, which we also instantiate for the SourceForge open-source repository in a certain way. Our approach covers checkout, building, tagging with metadata, fact extraction, analysis, and synthesis with a large degree of automation. Fact extraction relies on resolved (type-checked) ASTs. We describe a few examples of API-usage analysis; they are motivated by API migration. These examples are concerned with analysing API footprint (such as the numbers of distinct APIs used in a project), API coverage (such as the percentage of methods of an API used in a corpus), and framework-like vs.\ class-library-like usage.

Bibtex entry
 author = "Ralf L{\"a}mmel and Ekaterina Pek and J\"urgen Starek",
 title = "{Large-scale, AST-based API-usage analysis of open-source Java projects}",
 booktitle = "{SAC'11 - ACM 2011 SYMPOSIUM ON APPLIED COMPUTING, Technical Track on ``Programming Languages''}",
 year  = 2011,

Downloads and links