Data-efficient control policy search using residual dynamics learning

Matteo Saveriano, Yuchao Yin, Pietro Falco, Dongheui Lee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

35 Scopus citations

Abstract

In this work, we propose a model-based and data efficient approach for reinforcement learning. The main idea of our algorithm is to combine simulated and real rollouts to efficiently find an optimal control policy. While performing rollouts on the robot, we exploit sensory data to learn a probabilistic model of the residual difference between the measured state and the state predicted by a simplified model. The simplified model can be any dynamical system, from a very accurate system to a simple, linear one. The residual difference is learned with Gaussian processes. Hence, we assume that the difference between real and simplified model is Gaussian distributed, which is less strict than assuming that the real system is Gaussian distributed. The combination of the partial model and the learned residuals is exploited to predict the real system behavior and to search for an optimal policy. Simulations and experiments show that our approach significantly reduces the number of rollouts needed to find an optimal control policy for the real system.

Original languageEnglish
Title of host publicationIROS 2017 - IEEE/RSJ International Conference on Intelligent Robots and Systems
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4709-4715
Number of pages7
ISBN (Electronic)9781538626825
DOIs
StatePublished - 13 Dec 2017
Externally publishedYes
Event2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2017 - Vancouver, Canada
Duration: 24 Sep 201728 Sep 2017

Publication series

NameIEEE International Conference on Intelligent Robots and Systems
Volume2017-September
ISSN (Print)2153-0858
ISSN (Electronic)2153-0866

Conference

Conference2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2017
Country/TerritoryCanada
CityVancouver
Period24/09/1728/09/17

Fingerprint

Dive into the research topics of 'Data-efficient control policy search using residual dynamics learning'. Together they form a unique fingerprint.

Cite this