TY - JOUR
T1 - Out-of-Distribution Detection for Reinforcement Learning Agents with Probabilistic Dynamics Models
AU - Haider, Tom
AU - Roscher, Karsten
AU - da Roza, Felippe Schmoeller
AU - Günnemann, Stephan
N1 - Publisher Copyright:
© 2023 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
PY - 2023
Y1 - 2023
N2 - Reliability of reinforcement learning (RL) agents is a largely unsolved problem. Especially in situations that substantially differ from their training environment, RL agents often exhibit unpredictable behavior, potentially leading to performance loss, safety violations or catastrophic failure. Reliable decision making agents should therefore be able to cast an alert whenever they encounter situations they have never seen before and do not know how to handle. While the problem, also known as out-of-distribution (OOD) detection, has received considerable attention in other domains such as image classification or sensory data analysis, it is less frequently studied in the context of RL. In fact, there is not even a common understanding of what OOD actually means in RL. In this work, we want to bridge this gap and approach the topic of OOD in RL from a general perspective. For this, we formulate OOD in RL as severe perturbations of the Markov decision process (MDP). To detect such perturbations, we introduce a predictive algorithm utilizing probabilistic dynamics models and bootstrapped ensembles. Since existing benchmarks are sparse and limited in their complexity, we also propose a set of evaluation scenarios with OOD occurrences. A detailed analysis of our approach shows superior detection performance compared to existing baselines from related fields.
AB - Reliability of reinforcement learning (RL) agents is a largely unsolved problem. Especially in situations that substantially differ from their training environment, RL agents often exhibit unpredictable behavior, potentially leading to performance loss, safety violations or catastrophic failure. Reliable decision making agents should therefore be able to cast an alert whenever they encounter situations they have never seen before and do not know how to handle. While the problem, also known as out-of-distribution (OOD) detection, has received considerable attention in other domains such as image classification or sensory data analysis, it is less frequently studied in the context of RL. In fact, there is not even a common understanding of what OOD actually means in RL. In this work, we want to bridge this gap and approach the topic of OOD in RL from a general perspective. For this, we formulate OOD in RL as severe perturbations of the Markov decision process (MDP). To detect such perturbations, we introduce a predictive algorithm utilizing probabilistic dynamics models and bootstrapped ensembles. Since existing benchmarks are sparse and limited in their complexity, we also propose a set of evaluation scenarios with OOD occurrences. A detailed analysis of our approach shows superior detection performance compared to existing baselines from related fields.
KW - AI Safety
KW - Anomaly Detection
KW - OOD Detection
KW - Reinforcement Learning
KW - Sequential Decision Making
UR - http://www.scopus.com/inward/record.url?scp=85171291933&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85171291933
SN - 1548-8403
VL - 2023-May
SP - 851
EP - 859
JO - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
JF - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
T2 - 22nd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023
Y2 - 29 May 2023 through 2 June 2023
ER -