TY - GEN
T1 - MDPs as Distribution Transformers
T2 - 35th International Conference on Computer Aided Verification, CAV 2023
AU - Akshay, S.
AU - Chatterjee, Krishnendu
AU - Meggendorfer, Tobias
AU - Žikelić, Đorđe
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2023
Y1 - 2023
N2 - Markov decision processes can be viewed as transformers of probability distributions. While this view is useful from a practical standpoint to reason about trajectories of distributions, basic reachability and safety problems are known to be computationally intractable (i.e., Skolem-hard) to solve in such models. Further, we show that even for simple examples of MDPs, strategies for safety objectives over distributions can require infinite memory and randomization. In light of this, we present a novel overapproximation approach to synthesize strategies in an MDP, such that a safety objective over the distributions is met. More precisely, we develop a new framework for template-based synthesis of certificates as affine distributional and inductive invariants for safety objectives in MDPs. We provide two algorithms within this framework. One can only synthesize memoryless strategies, but has relative completeness guarantees, while the other can synthesize general strategies. The runtime complexity of both algorithms is in PSPACE. We implement these algorithms and show that they can solve several non-trivial examples.
AB - Markov decision processes can be viewed as transformers of probability distributions. While this view is useful from a practical standpoint to reason about trajectories of distributions, basic reachability and safety problems are known to be computationally intractable (i.e., Skolem-hard) to solve in such models. Further, we show that even for simple examples of MDPs, strategies for safety objectives over distributions can require infinite memory and randomization. In light of this, we present a novel overapproximation approach to synthesize strategies in an MDP, such that a safety objective over the distributions is met. More precisely, we develop a new framework for template-based synthesis of certificates as affine distributional and inductive invariants for safety objectives in MDPs. We provide two algorithms within this framework. One can only synthesize memoryless strategies, but has relative completeness guarantees, while the other can synthesize general strategies. The runtime complexity of both algorithms is in PSPACE. We implement these algorithms and show that they can solve several non-trivial examples.
KW - Markov decision processes
KW - Skolem hardness
KW - distribution transformers
KW - invariant synthesis
UR - http://www.scopus.com/inward/record.url?scp=85169447755&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-37709-9_5
DO - 10.1007/978-3-031-37709-9_5
M3 - Conference contribution
AN - SCOPUS:85169447755
SN - 9783031377082
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 86
EP - 112
BT - Computer Aided Verification - 35th International Conference, CAV 2023, Proceedings
A2 - Enea, Constantin
A2 - Lal, Akash
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 17 July 2023 through 22 July 2023
ER -