Skip to main navigation Skip to search Skip to main content

MEET: A Monte Carlo Exploration-Exploitation Trade-Off for Buffer Sampling

  • Julius Ott
  • , Lorenzo Servadei
  • , Jose Arjona-Medina
  • , Enrico Rinaldi
  • , Gianfranco Mauro
  • , Daniela Sánchez Lopera
  • , Michael Stephan
  • , Thomas Stadelmayer
  • , Avik Santra
  • , Robert Wille
  • Infineon Technologies AG
  • Technical University of Munich
  • Johannes Kepler University Linz
  • University of Michigan, Ann Arbor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task. To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. This is enabled by the uncertainty estimation of the Q-Value function, which guides the sampling to explore more significant transitions and, thus, learn a more efficient policy. Experiments on classical control environments demonstrate stable results across various environments. They show that the proposed method outperforms state-of-the-art sampling strategies for dense rewards w.r.t. convergence and peak performance by 26% on average.

Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728163277
DOIs
StatePublished - 2023
Event48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2023-June
ISSN (Print)1520-6149

Conference

Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Country/TerritoryGreece
CityRhodes Island
Period4/06/2310/06/23

Keywords

  • experience replay
  • reinforcement learning
  • uncertainty estimation

Fingerprint

Dive into the research topics of 'MEET: A Monte Carlo Exploration-Exploitation Trade-Off for Buffer Sampling'. Together they form a unique fingerprint.

Cite this