TY - GEN
T1 - Data-efficient control policy search using residual dynamics learning
AU - Saveriano, Matteo
AU - Yin, Yuchao
AU - Falco, Pietro
AU - Lee, Dongheui
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/13
Y1 - 2017/12/13
N2 - In this work, we propose a model-based and data efficient approach for reinforcement learning. The main idea of our algorithm is to combine simulated and real rollouts to efficiently find an optimal control policy. While performing rollouts on the robot, we exploit sensory data to learn a probabilistic model of the residual difference between the measured state and the state predicted by a simplified model. The simplified model can be any dynamical system, from a very accurate system to a simple, linear one. The residual difference is learned with Gaussian processes. Hence, we assume that the difference between real and simplified model is Gaussian distributed, which is less strict than assuming that the real system is Gaussian distributed. The combination of the partial model and the learned residuals is exploited to predict the real system behavior and to search for an optimal policy. Simulations and experiments show that our approach significantly reduces the number of rollouts needed to find an optimal control policy for the real system.
AB - In this work, we propose a model-based and data efficient approach for reinforcement learning. The main idea of our algorithm is to combine simulated and real rollouts to efficiently find an optimal control policy. While performing rollouts on the robot, we exploit sensory data to learn a probabilistic model of the residual difference between the measured state and the state predicted by a simplified model. The simplified model can be any dynamical system, from a very accurate system to a simple, linear one. The residual difference is learned with Gaussian processes. Hence, we assume that the difference between real and simplified model is Gaussian distributed, which is less strict than assuming that the real system is Gaussian distributed. The combination of the partial model and the learned residuals is exploited to predict the real system behavior and to search for an optimal policy. Simulations and experiments show that our approach significantly reduces the number of rollouts needed to find an optimal control policy for the real system.
UR - https://www.scopus.com/pages/publications/85041955392
U2 - 10.1109/IROS.2017.8206343
DO - 10.1109/IROS.2017.8206343
M3 - Conference contribution
AN - SCOPUS:85041955392
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 4709
EP - 4715
BT - IROS 2017 - IEEE/RSJ International Conference on Intelligent Robots and Systems
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2017
Y2 - 24 September 2017 through 28 September 2017
ER -