On information asymmetry in online reinforcement learning

Ezra Tampubolon, Haris Ceribasic, Holger Boche

Publikation: Beitrag in FachzeitschriftKonferenzartikelBegutachtung

Abstract

In this work, we study the system of two interacting non-cooperative Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which does not occur in an environment of general independent learners. Furthermore, we discuss the resulted post-learning policies, show that they are almost optimal in the underlying game sense, and provide numerical hints of almost welfare-optimal of the resulted policies.

OriginalspracheEnglisch
Seiten (von - bis)4955-4959
Seitenumfang5
FachzeitschriftICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Jahrgang2021-June
DOIs
PublikationsstatusVeröffentlicht - 2021
Veranstaltung2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Kanada
Dauer: 6 Juni 202111 Juni 2021

Fingerprint

Untersuchen Sie die Forschungsthemen von „On information asymmetry in online reinforcement learning“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren