TY - JOUR
T1 - PI-ELM
T2 - Reinforcement learning-based adaptable policy improvement for dynamical system
AU - Hu, Yingbai
AU - Wang, Xu
AU - Liu, Yueyue
AU - Ding, Weiping
AU - Knoll, Alois
N1 - Publisher Copyright:
© 2023 Elsevier Inc.
PY - 2023/12
Y1 - 2023/12
N2 - Behavioral cloning of imitation learning is theoretically sound that can capture and generate the motor skills from expert demonstrations, but they suffer poor adaptability with a small dataset in a new environment. This study improves adaptability, by proposing a novel reinforcement learning strategy for low-level behavioral learning with a small number of expert demonstrations. Specifically, the policy improvement-based reinforcement learning framework is divided into two phases: the low level is based on supervised learning using extreme learning machine (ELM) to clone the behavior from demonstrations, which can further be represented as a dynamical system with policy parameters; and the high level reinforcement learning improves the adaptability of ELM in new tasks. In this paper, we bridge the gap between machine learning and stochastic optimal control systems and propose the improved path integral-based reinforcement learning PI-ELM strategy to learn the policy parameters from low-level ELM. The proposed framework's performance and effectiveness are illustrated through several task experiments. The results indicate that our method can significantly improve the adaptability of imitation learning in new scenarios, including single task obstacle avoidance, via-points, antidisturbance, or hybrid tasks.
AB - Behavioral cloning of imitation learning is theoretically sound that can capture and generate the motor skills from expert demonstrations, but they suffer poor adaptability with a small dataset in a new environment. This study improves adaptability, by proposing a novel reinforcement learning strategy for low-level behavioral learning with a small number of expert demonstrations. Specifically, the policy improvement-based reinforcement learning framework is divided into two phases: the low level is based on supervised learning using extreme learning machine (ELM) to clone the behavior from demonstrations, which can further be represented as a dynamical system with policy parameters; and the high level reinforcement learning improves the adaptability of ELM in new tasks. In this paper, we bridge the gap between machine learning and stochastic optimal control systems and propose the improved path integral-based reinforcement learning PI-ELM strategy to learn the policy parameters from low-level ELM. The proposed framework's performance and effectiveness are illustrated through several task experiments. The results indicate that our method can significantly improve the adaptability of imitation learning in new scenarios, including single task obstacle avoidance, via-points, antidisturbance, or hybrid tasks.
KW - Adaptability, policy improvement
KW - Extreme learning machine
KW - Imitation learning
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85172030221&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2023.119700
DO - 10.1016/j.ins.2023.119700
M3 - Article
AN - SCOPUS:85172030221
SN - 0020-0255
VL - 650
JO - Information Sciences
JF - Information Sciences
M1 - 119700
ER -