PI-ELM: Reinforcement learning-based adaptable policy improvement for dynamical system

Yingbai Hu, Xu Wang, Yueyue Liu, Weiping Ding, Alois Knoll

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Behavioral cloning of imitation learning is theoretically sound that can capture and generate the motor skills from expert demonstrations, but they suffer poor adaptability with a small dataset in a new environment. This study improves adaptability, by proposing a novel reinforcement learning strategy for low-level behavioral learning with a small number of expert demonstrations. Specifically, the policy improvement-based reinforcement learning framework is divided into two phases: the low level is based on supervised learning using extreme learning machine (ELM) to clone the behavior from demonstrations, which can further be represented as a dynamical system with policy parameters; and the high level reinforcement learning improves the adaptability of ELM in new tasks. In this paper, we bridge the gap between machine learning and stochastic optimal control systems and propose the improved path integral-based reinforcement learning PI-ELM strategy to learn the policy parameters from low-level ELM. The proposed framework's performance and effectiveness are illustrated through several task experiments. The results indicate that our method can significantly improve the adaptability of imitation learning in new scenarios, including single task obstacle avoidance, via-points, antidisturbance, or hybrid tasks.

Original languageEnglish
Article number119700
JournalInformation Sciences
Volume650
DOIs
StatePublished - Dec 2023

Keywords

  • Adaptability, policy improvement
  • Extreme learning machine
  • Imitation learning
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'PI-ELM: Reinforcement learning-based adaptable policy improvement for dynamical system'. Together they form a unique fingerprint.

Cite this