TY - JOUR
T1 - Finite-Memory Near-Optimal Learning for Markov Decision Processes with Long-Run Average Reward
AU - Kretínský, Jan
AU - Michel, Fabian
AU - Michel, Lukas
AU - Pérez, Guillermo A.
N1 - Publisher Copyright:
© 2020 Proceedings of Machine Learning Research. All rights reserved.
PY - 2020
Y1 - 2020
N2 - We consider learning policies online in Markov decision processes with the long-run average reward (a.k.a. mean payoff). To ensure implementability of the policies, we focus on policies with finite memory. Firstly, we show that near optimality can be achieved almost surely, using an unintuitive gadget we call forgetfulness. Secondly, we extend the approach to a setting with partial knowledge of the system topology, introducing two optimality measures and providing near-optimal algorithms also for these cases.
AB - We consider learning policies online in Markov decision processes with the long-run average reward (a.k.a. mean payoff). To ensure implementability of the policies, we focus on policies with finite memory. Firstly, we show that near optimality can be achieved almost surely, using an unintuitive gadget we call forgetfulness. Secondly, we extend the approach to a setting with partial knowledge of the system topology, introducing two optimality measures and providing near-optimal algorithms also for these cases.
UR - http://www.scopus.com/inward/record.url?scp=85162660474&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85162660474
SN - 2640-3498
VL - 124
SP - 1149
EP - 1158
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 36th Conference on Uncertainty in Artificial Intelligence, UAI 2020
Y2 - 3 August 2020 through 6 August 2020
ER -