TY - JOUR
T1 - One-Unambiguous Regular Languages
AU - Brüggemann-Klein, Anne
AU - Wood, Derick
N1 - Funding Information:
* The work of the second author was supported under a Natural Sciences and Engineering Research Council of Canada Grant and under an Information Technology Research Centre of Ontario Grant. -WWW: http: www11.informatik.tu-muenchen.de. WWW: http: www.cs.ust.hk dwood.
PY - 1998/5/1
Y1 - 1998/5/1
N2 - The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic meta-language for the definition of textual markup systems. In the standard, the right-hand sides of productions are based on regular expressions, although only regular expressions that denote words unambiguously, in the sense of the ISO standard, are allowed. In general, a word that is denoted by a regular expression is witnessed by a sequence of occurrences of symbols in the regular expression that match the word. In an unambiguous regular expression as defined by Book et al. (1971, IEEE Trans. Comput. C-20(2), 149-153) each word has at most one witness. But the SGML standard also requires that a witness be computed incrementally from the word with a one-symbol lookahead; we call such regular expressions 1-unambiguous. A regular language is a 1-unambiguous language if it is denoted by some 1-unambiguous regular expression. We give a Kleene theorem for 1-unambiguous languages and characterize 1-unambiguous regular language in terms of structural properties of the minimal deterministic automata that recognize them. As a result we are able to prove the decidability of whether a given regular expression denotes a 1-unambiguous language; if it does, then we can construct an equivalent 1-unambiguous regular expression in worst-case optimal time.
AB - The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic meta-language for the definition of textual markup systems. In the standard, the right-hand sides of productions are based on regular expressions, although only regular expressions that denote words unambiguously, in the sense of the ISO standard, are allowed. In general, a word that is denoted by a regular expression is witnessed by a sequence of occurrences of symbols in the regular expression that match the word. In an unambiguous regular expression as defined by Book et al. (1971, IEEE Trans. Comput. C-20(2), 149-153) each word has at most one witness. But the SGML standard also requires that a witness be computed incrementally from the word with a one-symbol lookahead; we call such regular expressions 1-unambiguous. A regular language is a 1-unambiguous language if it is denoted by some 1-unambiguous regular expression. We give a Kleene theorem for 1-unambiguous languages and characterize 1-unambiguous regular language in terms of structural properties of the minimal deterministic automata that recognize them. As a result we are able to prove the decidability of whether a given regular expression denotes a 1-unambiguous language; if it does, then we can construct an equivalent 1-unambiguous regular expression in worst-case optimal time.
UR - http://www.scopus.com/inward/record.url?scp=0000447268&partnerID=8YFLogxK
U2 - 10.1006/inco.1997.2695
DO - 10.1006/inco.1997.2695
M3 - Article
AN - SCOPUS:0000447268
SN - 0890-5401
VL - 142
SP - 182
EP - 206
JO - Information and Computation
JF - Information and Computation
IS - 2
ER -