Skip to main navigation Skip to search Skip to main content

Value iteration for long-run average reward in markov decision processes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

30 Scopus citations

Abstract

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks.

Original languageEnglish
Title of host publicationComputer Aided Verification - 29th International Conference, CAV 2017, Proceedings
EditorsViktor Kuncak, Rupak Majumdar
PublisherSpringer Verlag
Pages201-221
Number of pages21
ISBN (Print)9783319633862
DOIs
StatePublished - 2017
Event29th International Conference on Computer Aided Verification, CAV 2017 - Heidelberg, Germany
Duration: 24 Jul 201728 Jul 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10426 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference29th International Conference on Computer Aided Verification, CAV 2017
Country/TerritoryGermany
CityHeidelberg
Period24/07/1728/07/17

Fingerprint

Dive into the research topics of 'Value iteration for long-run average reward in markov decision processes'. Together they form a unique fingerprint.

Cite this