TY - JOUR
T1 - Unifying two views on multiple mean-payoff objectives in markov decision processes
AU - Chatterjee, Krishnendu
AU - Křetínská, Zuzana
AU - Křetínský, Jan
N1 - Publisher Copyright:
© K. Chatterjee, Z. Křetínská, and J. Křetínský.
PY - 2017
Y1 - 2017
N2 - We consider Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) objectives. There exist two different views: (i) the expectation semantics, where the goal is to optimize the expected mean-payoff objective, and (ii) the satisfaction semantics, where the goal is to maximize the probability of runs such that the meanpayoff value stays above a given vector. We consider optimization with respect to both objectives at once, thus unifying the existing semantics. Precisely, the goal is to optimize the expectation while ensuring the satisfaction constraint. Our problem captures the notion of optimization with respect to strategies that are risk-averse (i.e., ensure certain probabilistic guarantee). Our main results are as follows: First, we present algorithms for the decision problems, which are always polynomial in the size of the MDP. We also show that an approximation of the Pareto curve can be computed in time polynomial in the size of the MDP, and the approximation factor, but exponential in the number of dimensions. Second, we present a complete characterization of the strategy complexity (in terms of memory bounds and randomization) required to solve our problem.
AB - We consider Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) objectives. There exist two different views: (i) the expectation semantics, where the goal is to optimize the expected mean-payoff objective, and (ii) the satisfaction semantics, where the goal is to maximize the probability of runs such that the meanpayoff value stays above a given vector. We consider optimization with respect to both objectives at once, thus unifying the existing semantics. Precisely, the goal is to optimize the expectation while ensuring the satisfaction constraint. Our problem captures the notion of optimization with respect to strategies that are risk-averse (i.e., ensure certain probabilistic guarantee). Our main results are as follows: First, we present algorithms for the decision problems, which are always polynomial in the size of the MDP. We also show that an approximation of the Pareto curve can be computed in time polynomial in the size of the MDP, and the approximation factor, but exponential in the number of dimensions. Second, we present a complete characterization of the strategy complexity (in terms of memory bounds and randomization) required to solve our problem.
UR - http://www.scopus.com/inward/record.url?scp=84945944146&partnerID=8YFLogxK
U2 - 10.23638/LMCS-13(2:15)2017
DO - 10.23638/LMCS-13(2:15)2017
M3 - Article
AN - SCOPUS:84945944146
SN - 1860-5974
VL - 13
JO - Logical Methods in Computer Science
JF - Logical Methods in Computer Science
IS - 2
M1 - 15
ER -