TY - GEN
T1 - Unambiguity of extended regular expressions in SGML document grammars
AU - Brüggemann-Klein, Anne
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 1993.
PY - 1993
Y1 - 1993
N2 - In the Standard Generalized Markup Language (SGML), document types are defined by context-free graznmars in an extended Backus-Naur form. The right-hand side of a production is called a content model. Content models are extended regular expressions that have to be unambiguous in the sense that “an element… that occurs in the document instance must be able to satisfy only one primitive content token without looking ahead in the document instance.” In this paper, we present a linear-time algorithm that decides whether a given content model is unambiguous. A similar result has previously been obtained not for content models but for the smaller class of standard regular expressions. It reties on the fact that the languages of marked regular expressions are local—a property that does not hold any more for content models that contain the new &-operator. Therefore, it is necessary to develop new techniques for content models. Besides solving an interesting problem in formal language theory, our results are relevant for developers of SGML systems. In fact, our definitions are causing changes to the revised edition of the SGML standard, and the algorithm to test content models for unambiguity has been implemented in an SGML parser.
AB - In the Standard Generalized Markup Language (SGML), document types are defined by context-free graznmars in an extended Backus-Naur form. The right-hand side of a production is called a content model. Content models are extended regular expressions that have to be unambiguous in the sense that “an element… that occurs in the document instance must be able to satisfy only one primitive content token without looking ahead in the document instance.” In this paper, we present a linear-time algorithm that decides whether a given content model is unambiguous. A similar result has previously been obtained not for content models but for the smaller class of standard regular expressions. It reties on the fact that the languages of marked regular expressions are local—a property that does not hold any more for content models that contain the new &-operator. Therefore, it is necessary to develop new techniques for content models. Besides solving an interesting problem in formal language theory, our results are relevant for developers of SGML systems. In fact, our definitions are causing changes to the revised edition of the SGML standard, and the algorithm to test content models for unambiguity has been implemented in an SGML parser.
UR - https://www.scopus.com/pages/publications/85029409396
U2 - 10.1007/3-540-57273-2_45
DO - 10.1007/3-540-57273-2_45
M3 - Conference contribution
AN - SCOPUS:85029409396
SN - 9783540572732
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 73
EP - 84
BT - Algorithms ESA 1993 – 1st Annual European Symposium, Proceedings
A2 - Lengauer, Thomas
PB - Springer Verlag
T2 - 1st Annual European Symposium on Algorithms, ESA 1993
Y2 - 30 September 1993 through 2 October 1993
ER -