TY - JOUR
T1 - Safe Reinforcement Learning Using Black-Box Reachability Analysis
AU - Selim, Mahmoud
AU - Alanwar, Amr
AU - Kousik, Shreyas
AU - Gao, Grace
AU - Pavone, Marco
AU - Johansson, Karl H.
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2022/10/1
Y1 - 2022/10/1
N2 - Reinforcement learning (RL) is capable of sophisticated motion planning and control for robots in uncertain environments. However, state-of-the-art deep RL approaches typically lack safety guarantees, especially when the robot and environment models are unknown. To justify widespread deployment, robots must respect safety constraints without sacrificing performance. Thus, we propose a Black-box Reachability-based Safety Layer (BRSL) with three main components: (1) data-driven reachability analysis for a black-box robot model, (2) a trajectory rollout planner that predicts future actions and observations using an ensemble of neural networks trained online, and (3) a differentiable polytope collision check between the reachable set and obstacles that enables correcting unsafe actions. In simulation, BRSL outperforms other state-of-the-art safe RL methods on a Turtlebot 3, a quadrotor, a trajectory-tracking point mass, and a hexarotor in wind with an unsafe set adjacent to the area of highest reward.
AB - Reinforcement learning (RL) is capable of sophisticated motion planning and control for robots in uncertain environments. However, state-of-the-art deep RL approaches typically lack safety guarantees, especially when the robot and environment models are unknown. To justify widespread deployment, robots must respect safety constraints without sacrificing performance. Thus, we propose a Black-box Reachability-based Safety Layer (BRSL) with three main components: (1) data-driven reachability analysis for a black-box robot model, (2) a trajectory rollout planner that predicts future actions and observations using an ensemble of neural networks trained online, and (3) a differentiable polytope collision check between the reachable set and obstacles that enables correcting unsafe actions. In simulation, BRSL outperforms other state-of-the-art safe RL methods on a Turtlebot 3, a quadrotor, a trajectory-tracking point mass, and a hexarotor in wind with an unsafe set adjacent to the area of highest reward.
KW - Reinforcement learning
KW - robot safety
KW - task and motion planning
UR - http://www.scopus.com/inward/record.url?scp=85135247894&partnerID=8YFLogxK
U2 - 10.1109/LRA.2022.3192205
DO - 10.1109/LRA.2022.3192205
M3 - Article
AN - SCOPUS:85135247894
SN - 2377-3766
VL - 7
SP - 10665
EP - 10672
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 4
ER -