TY - JOUR
T1 - Off-Policy Risk-Sensitive Reinforcement Learning-Based Constrained Robust Optimal Control
AU - Li, Cong
AU - Liu, Qingchen
AU - Zhou, Zhehua
AU - Buss, Martin
AU - Liu, Fangzhou
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2023/4/1
Y1 - 2023/4/1
N2 - This article proposes an off-policy risk-sensitive reinforcement learning (RL)-based control framework to jointly optimize the task performance and constraint satisfaction in a disturbed environment. The risk-aware value function, constructed using the pseudo control and risk-sensitive input and state penalty terms, is introduced to convert the original constrained robust stabilization problem into an equivalent unconstrained optimal control problem. Then, an off-policy RL algorithm is developed to learn the approximate solution to the risk-aware value function. During the learning process, the associated approximate optimal control policy is able to satisfy both input and state constraints under disturbances. By replaying experience data to the off-policy weight update law of the critic neural network, the weight convergence is guaranteed. Moreover, online and offline algorithms are developed to serve as principled ways to record informative experience data to achieve a sufficient excitation required for the weight convergence. The proofs of system stability and weight convergence are provided. The Simulation results reveal the validity of the proposed control framework.
AB - This article proposes an off-policy risk-sensitive reinforcement learning (RL)-based control framework to jointly optimize the task performance and constraint satisfaction in a disturbed environment. The risk-aware value function, constructed using the pseudo control and risk-sensitive input and state penalty terms, is introduced to convert the original constrained robust stabilization problem into an equivalent unconstrained optimal control problem. Then, an off-policy RL algorithm is developed to learn the approximate solution to the risk-aware value function. During the learning process, the associated approximate optimal control policy is able to satisfy both input and state constraints under disturbances. By replaying experience data to the off-policy weight update law of the critic neural network, the weight convergence is guaranteed. Moreover, online and offline algorithms are developed to serve as principled ways to record informative experience data to achieve a sufficient excitation required for the weight convergence. The proofs of system stability and weight convergence are provided. The Simulation results reveal the validity of the proposed control framework.
KW - Adaptive dynamic programming (ADP)
KW - input saturation
KW - off-policy risk-sensitive reinforcement learning (RL)
KW - robust control
KW - state constraint
UR - http://www.scopus.com/inward/record.url?scp=85141443850&partnerID=8YFLogxK
U2 - 10.1109/TSMC.2022.3213750
DO - 10.1109/TSMC.2022.3213750
M3 - Article
AN - SCOPUS:85141443850
SN - 2168-2216
VL - 53
SP - 2478
EP - 2491
JO - IEEE Transactions on Systems, Man, and Cybernetics: Systems
JF - IEEE Transactions on Systems, Man, and Cybernetics: Systems
IS - 4
ER -