The Amsterdam toolkit for language archaeology

Author

Ralf Lämmel

Abstract

GRK --- the Grammar Recovery Kit --- illustrates options for automation and corresponding tool support in the context of developing quality language references that readily cater for the derivation of parsers.

GRK provides the proof-of-concept for two notions: (i) semi-automatic grammar recovery; (ii) language-reference re-engineering. GRK's support for semi-automatic grammar recovery means that GRK can be used to obtain a relatively correct and complete as well as implementable grammar from a language reference. GRK's support for language-reference re-engineering means that GRK can be used to update the original language reference such that it reflects the completed and corrected grammar knowledge.

As of today, GRK is particularly fit for Cobol archaeology, more specifically for IBM's VS Cobol II. That is, GRK offers a fully mechanised process, where IBM's reference is used as an input, and the output is a transformed language reference whose grammar portions are correct and complete. (The recovery required several hundreds of simple transformation steps in order to deliver a grammar that is fit for parser derivation.) As a byproduct, GRK also generates a slow, Prolog-based parser. Via export to GRK's sibling, GDK (the Grammar Deployment Kit), a reasonably fast, btyacc-based parser can be generated as well. Both parsers accept all of the VS Cobol II code that is at our avail (several millions of lines of code).

Bibtex entries

@inproceedings{GRK,
 author = "Ralf L{\"a}mmel",
 title  = "{The Amsterdam toolkit for language archaeology}",
 booktitle = "{Post-proceedings of the 2nd International Workshop
               on Meta-Models, Schemas and Grammars for Reverse Engineering
               (ATEM 2004)}",
 year  = 2005,
 notes = "12 pages; To appear in ENTCS; Published by Elsevier Science"
}

Links

Paper: [.pdf]
Source code (GRK): GitHub repo
Browsable Cobol grammar: .html

Website maintained by Ralf Lämmel (Email: rlaemmel@gmail.com)