TY - JOUR
T1 - On information asymmetry in online reinforcement learning
AU - Tampubolon, Ezra
AU - Ceribasic, Haris
AU - Boche, Holger
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - In this work, we study the system of two interacting non-cooperative Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which does not occur in an environment of general independent learners. Furthermore, we discuss the resulted post-learning policies, show that they are almost optimal in the underlying game sense, and provide numerical hints of almost welfare-optimal of the resulted policies.
AB - In this work, we study the system of two interacting non-cooperative Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which does not occur in an environment of general independent learners. Furthermore, we discuss the resulted post-learning policies, show that they are almost optimal in the underlying game sense, and provide numerical hints of almost welfare-optimal of the resulted policies.
KW - Information asymmetry
KW - Markov game
KW - Q-learning
KW - Reinforcement learning
KW - Resource allocation
UR - http://www.scopus.com/inward/record.url?scp=85115130914&partnerID=8YFLogxK
U2 - 10.1109/ICASSP39728.2021.9413968
DO - 10.1109/ICASSP39728.2021.9413968
M3 - Conference article
AN - SCOPUS:85115130914
SN - 1520-6149
VL - 2021-June
SP - 4955
EP - 4959
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
T2 - 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
Y2 - 6 June 2021 through 11 June 2021
ER -