TY - GEN
T1 - GrammarForge
T2 - 22nd International Conference on Software Engineering and Formal Methods, SEFM 2024
AU - Sochor, Hannes
AU - Ferrarotti, Flavio
AU - Wille, Robert
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Providing good methods for testing properties of software is critical. Such methods often depend on a formal description of the program input language, in particular, when they are based on grammar-based fuzzing. Unfortunately, it cannot be ensured that such a formal description is always available. To tackle this problem, we propose a new method that automates grammar learning for the input language of the software under test, combining classical language membership queries with light weight source code analysis tools such as control flow graphs and program instrumentation. We present a prototype implementation (GrammarForge) of our method, which works following a process of automated conservative substitution of place holders by terminals, starting from an initial grammar structure extracted from the source code. It targets arbitrary parses implemented using recursive descent techniques. We perform extensive experimentation, which shows that GrammarForge outperforms alternative tools with respect to accuracy of the learned grammar. Moreover, and different to all other state of the art methods, our learning process does not need seed inputs from the target language.
AB - Providing good methods for testing properties of software is critical. Such methods often depend on a formal description of the program input language, in particular, when they are based on grammar-based fuzzing. Unfortunately, it cannot be ensured that such a formal description is always available. To tackle this problem, we propose a new method that automates grammar learning for the input language of the software under test, combining classical language membership queries with light weight source code analysis tools such as control flow graphs and program instrumentation. We present a prototype implementation (GrammarForge) of our method, which works following a process of automated conservative substitution of place holders by terminals, starting from an initial grammar structure extracted from the source code. It targets arbitrary parses implemented using recursive descent techniques. We perform extensive experimentation, which shows that GrammarForge outperforms alternative tools with respect to accuracy of the learned grammar. Moreover, and different to all other state of the art methods, our learning process does not need seed inputs from the target language.
UR - http://www.scopus.com/inward/record.url?scp=85210891018&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-77382-2_16
DO - 10.1007/978-3-031-77382-2_16
M3 - Conference contribution
AN - SCOPUS:85210891018
SN - 9783031773815
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 272
EP - 289
BT - Software Engineering and Formal Methods - 22nd International Conference, SEFM 2024, Proceedings
A2 - Madeira, Alexandre
A2 - Knapp, Alexander
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 6 November 2024 through 8 November 2024
ER -