TY - JOUR
T1 - Learning CPG-based biped locomotion with a policy gradient method
T2 - Application to a humanoid robot
AU - Endo, Gen
AU - Morimoto, Jun
AU - Matsubara, Takamitsu
AU - Nakanishi, Jun
AU - Cheng, Gordon
PY - 2008/2
Y1 - 2008/2
N2 - In this paper we describe a learning framework for a central pattern generator (CPG)-based biped locomotion controller using a policy gradient method. Our goals in this study are to achieve CPG-based biped walking with a 3D hardware humanoid and to develop an efficient learning algorithm with CPG by reducing the dimensionality of the state space used for learning. We demonstrate that an appropriate feedback controller can be acquired within a few thousand trials by numerical simulations and the controller obtained in numerical simulation achieves stable walking with a physical robot in the real world. Numerical simulations and hardware experiments evaluate the walking velocity and stability. The results suggest that the learning algorithm is capable of adapting to environmental changes. Furthermore, we present an online learning scheme with an initial policy for a hardware robot to improve the controller within 200 iterations.
AB - In this paper we describe a learning framework for a central pattern generator (CPG)-based biped locomotion controller using a policy gradient method. Our goals in this study are to achieve CPG-based biped walking with a 3D hardware humanoid and to develop an efficient learning algorithm with CPG by reducing the dimensionality of the state space used for learning. We demonstrate that an appropriate feedback controller can be acquired within a few thousand trials by numerical simulations and the controller obtained in numerical simulation achieves stable walking with a physical robot in the real world. Numerical simulations and hardware experiments evaluate the walking velocity and stability. The results suggest that the learning algorithm is capable of adapting to environmental changes. Furthermore, we present an online learning scheme with an initial policy for a hardware robot to improve the controller within 200 iterations.
KW - Bipedal locomotion
KW - Central pattern generator
KW - Humanoid robots
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=38649142135&partnerID=8YFLogxK
U2 - 10.1177/0278364907084980
DO - 10.1177/0278364907084980
M3 - Article
AN - SCOPUS:38649142135
SN - 0278-3649
VL - 27
SP - 213
EP - 228
JO - International Journal of Robotics Research
JF - International Journal of Robotics Research
IS - 2
ER -