TY - GEN
T1 - Regular expressions into finite automata
AU - Brüggemann-Klein, Anne
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 1992.
PY - 1992
Y1 - 1992
N2 - It is a well-established fact that each regular expression can be transformed into a nondeterministic finite automaton (NFA) with or without ∈-transitions, and all authors seem to provide their own variant of the construction. Of these, Berry and Sethi have shown that the construction of an ∈-free NFA due to Glushkov is a natural representation of the regular expression, because it can be described in terms of the Brzozowski derivatives of the expression. Moreover, the Glushkov construction also plays a significant role in the document processing area: The SGML standard, now widely adopted by publishing houses and government agencies for the syntactic specification of textual markup systems, uses deterministic regular expressions, i.e. expressions whose Glushkov automaton is deterministic, as a description language for document types. In this paper, we first show that the Glushkov automaton can be constructed in time quadratic in the size of the expression, and that this is worst case optimal. For deterministic expressions, our algorithm has even linear run time. This improves on the cubic time methods suggested in the literature. A major step of the algorithm consists in bringing the expression into what we call star normal form. This concept is also useful for characterizing the relationship between two types of unambiguity that have been studied in the literature. Namely, we show that, modulo a technical condition, an expression is strongly unambiguous if and only it is weakly unambiguous and in star normal form. This leads to our third result, a quadratic time decision algorithm for weak unambiguity, that improves on the biquadratic method introduced by Book et al.
AB - It is a well-established fact that each regular expression can be transformed into a nondeterministic finite automaton (NFA) with or without ∈-transitions, and all authors seem to provide their own variant of the construction. Of these, Berry and Sethi have shown that the construction of an ∈-free NFA due to Glushkov is a natural representation of the regular expression, because it can be described in terms of the Brzozowski derivatives of the expression. Moreover, the Glushkov construction also plays a significant role in the document processing area: The SGML standard, now widely adopted by publishing houses and government agencies for the syntactic specification of textual markup systems, uses deterministic regular expressions, i.e. expressions whose Glushkov automaton is deterministic, as a description language for document types. In this paper, we first show that the Glushkov automaton can be constructed in time quadratic in the size of the expression, and that this is worst case optimal. For deterministic expressions, our algorithm has even linear run time. This improves on the cubic time methods suggested in the literature. A major step of the algorithm consists in bringing the expression into what we call star normal form. This concept is also useful for characterizing the relationship between two types of unambiguity that have been studied in the literature. Namely, we show that, modulo a technical condition, an expression is strongly unambiguous if and only it is weakly unambiguous and in star normal form. This leads to our third result, a quadratic time decision algorithm for weak unambiguity, that improves on the biquadratic method introduced by Book et al.
UR - http://www.scopus.com/inward/record.url?scp=85030556445&partnerID=8YFLogxK
U2 - 10.1007/BFb0023820
DO - 10.1007/BFb0023820
M3 - Conference contribution
AN - SCOPUS:85030556445
SN - 9783540552840
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 87
EP - 98
BT - LATIN 1992 - 1st Latin American Symposium on Theoretical Informatics, Proceedings
A2 - Simon, Imre
PB - Springer Verlag
T2 - 1st International Symposium on Latin American Theoretical Informatics, LATIN 1992
Y2 - 6 April 1992 through 10 April 1992
ER -