Out-of-Distribution Detection for Reinforcement Learning Agents with Probabilistic Dynamics Models

Tom Haider, Karsten Roscher, Felippe Schmoeller da Roza, Stephan Günnemann

Publikation: Beitrag in FachzeitschriftKonferenzartikelBegutachtung

7 Zitate (Scopus)

Abstract

Reliability of reinforcement learning (RL) agents is a largely unsolved problem. Especially in situations that substantially differ from their training environment, RL agents often exhibit unpredictable behavior, potentially leading to performance loss, safety violations or catastrophic failure. Reliable decision making agents should therefore be able to cast an alert whenever they encounter situations they have never seen before and do not know how to handle. While the problem, also known as out-of-distribution (OOD) detection, has received considerable attention in other domains such as image classification or sensory data analysis, it is less frequently studied in the context of RL. In fact, there is not even a common understanding of what OOD actually means in RL. In this work, we want to bridge this gap and approach the topic of OOD in RL from a general perspective. For this, we formulate OOD in RL as severe perturbations of the Markov decision process (MDP). To detect such perturbations, we introduce a predictive algorithm utilizing probabilistic dynamics models and bootstrapped ensembles. Since existing benchmarks are sparse and limited in their complexity, we also propose a set of evaluation scenarios with OOD occurrences. A detailed analysis of our approach shows superior detection performance compared to existing baselines from related fields.

OriginalspracheEnglisch
Seiten (von - bis)851-859
Seitenumfang9
FachzeitschriftProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Jahrgang2023-May
PublikationsstatusVeröffentlicht - 2023
Veranstaltung22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023 - London, Großbritannien/Vereinigtes Königreich
Dauer: 29 Mai 20232 Juni 2023

Fingerprint

Untersuchen Sie die Forschungsthemen von „Out-of-Distribution Detection for Reinforcement Learning Agents with Probabilistic Dynamics Models“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren