TY - GEN
T1 - MEET
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
AU - Ott, Julius
AU - Servadei, Lorenzo
AU - Arjona-Medina, Jose
AU - Rinaldi, Enrico
AU - Mauro, Gianfranco
AU - Lopera, Daniela Sánchez
AU - Stephan, Michael
AU - Stadelmayer, Thomas
AU - Santra, Avik
AU - Wille, Robert
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task. To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. This is enabled by the uncertainty estimation of the Q-Value function, which guides the sampling to explore more significant transitions and, thus, learn a more efficient policy. Experiments on classical control environments demonstrate stable results across various environments. They show that the proposed method outperforms state-of-the-art sampling strategies for dense rewards w.r.t. convergence and peak performance by 26% on average.
AB - Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task. To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. This is enabled by the uncertainty estimation of the Q-Value function, which guides the sampling to explore more significant transitions and, thus, learn a more efficient policy. Experiments on classical control environments demonstrate stable results across various environments. They show that the proposed method outperforms state-of-the-art sampling strategies for dense rewards w.r.t. convergence and peak performance by 26% on average.
KW - experience replay
KW - reinforcement learning
KW - uncertainty estimation
UR - http://www.scopus.com/inward/record.url?scp=85177555817&partnerID=8YFLogxK
U2 - 10.1109/ICASSP49357.2023.10095236
DO - 10.1109/ICASSP49357.2023.10095236
M3 - Conference contribution
AN - SCOPUS:85177555817
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 June 2023 through 10 June 2023
ER -