Finite-memory near-optimal learning for Markov decision processes with long-run average reward

Jan Kretínský, Fabian Michel, Lukas Michel, Guillermo A. Pérez

Research output: Contribution to conferencePaperpeer-review

2 Scopus citations

Abstract

We consider learning policies online in Markov decision processes with the long-run average reward (a.k.a. mean payoff). To ensure implementability of the policies, we focus on policies with finite memory. Firstly, we show that near optimality can be achieved almost surely, using an unintuitive gadget we call forgetfulness. Secondly, we extend the approach to a setting with partial knowledge of the system topology, introducing two optimality measures and providing near-optimal algorithms also for these cases.

Original languageEnglish
Pages1149-1158
Number of pages10
StatePublished - 2020
Event36th Conference on Uncertainty in Artificial Intelligence, UAI 2020 - Virtual, Online
Duration: 3 Aug 20206 Aug 2020

Conference

Conference36th Conference on Uncertainty in Artificial Intelligence, UAI 2020
CityVirtual, Online
Period3/08/206/08/20

Fingerprint

Dive into the research topics of 'Finite-memory near-optimal learning for Markov decision processes with long-run average reward'. Together they form a unique fingerprint.

Cite this