TY - JOUR
T1 - Stable Inverse Reinforcement Learning
T2 - Policies From Control Lyapunov Landscapes
AU - Tesfazgi, Samuel
AU - Sprandl, Leonhard
AU - Lederer, Armin
AU - Hirche, Sandra
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2024
Y1 - 2024
N2 - Learning from expert demonstrations to flexibly program an autonomous system with complex behaviors or to predict an agent's behavior is a powerful tool, especially in collaborative control settings. A common method to solve this problem is inverse reinforcement learning (IRL), where the observed agent, e.g., a human demonstrator, is assumed to behave according to the optimization of an intrinsic cost function that reflects its intent and informs its control actions. While the framework is expressive, the inferred control policies generally lack convergence guarantees, which are critical for safe deployment in real-world settings. We therefore propose a novel, stability-certified IRL approach by reformulating the cost function inference problem to learning control Lyapunov functions (CLF) from demonstrations data. By additionally exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs by observing the attractor landscape of the induced dynamics. For the construction of the inverse optimal CLFs, we use a Sum of Squares and formulate a convex optimization problem. We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world, human-generated data.
AB - Learning from expert demonstrations to flexibly program an autonomous system with complex behaviors or to predict an agent's behavior is a powerful tool, especially in collaborative control settings. A common method to solve this problem is inverse reinforcement learning (IRL), where the observed agent, e.g., a human demonstrator, is assumed to behave according to the optimization of an intrinsic cost function that reflects its intent and informs its control actions. While the framework is expressive, the inferred control policies generally lack convergence guarantees, which are critical for safe deployment in real-world settings. We therefore propose a novel, stability-certified IRL approach by reformulating the cost function inference problem to learning control Lyapunov functions (CLF) from demonstrations data. By additionally exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs by observing the attractor landscape of the induced dynamics. For the construction of the inverse optimal CLFs, we use a Sum of Squares and formulate a convex optimization problem. We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world, human-generated data.
KW - Control Lyapunov function
KW - convex optimization
KW - inverse optimality
KW - inverse reinforcement learning
KW - learning from demonstrations
KW - sum of squares
UR - https://www.scopus.com/pages/publications/85208516132
U2 - 10.1109/OJCSYS.2024.3447464
DO - 10.1109/OJCSYS.2024.3447464
M3 - Article
AN - SCOPUS:85208516132
SN - 2694-085X
VL - 3
SP - 358
EP - 374
JO - IEEE Open Journal of Control Systems
JF - IEEE Open Journal of Control Systems
ER -