Reward-Punishment Actor-Critic Algorithm Applying to Robotic Non-grasping Manipulation

Taisuke Kobayashi, Takumi Aotani, Julio Rogelio Guadarrama-Olvera, Emmanuel Dean-Leon, Gordon Cheng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

This paper presents a new actor-critic (AC) algorithm based on a biological reward-punishment framework for reinforcement learning (RL), named 'RP-AC'. RL can yields capabilities where robots can take over complicated and dangerous tasks instead of human. Such capability, however, may require accounting for safety when achieving the tasks, because dangerous states and actions could cause the robot to breakdown and induce injuries. In the reward-punishment framework, robots gain both positive reward (called reward) as the degree of task achievement and negative reward (called punishment) as the risk of current state and action, which are composed in RL algorithms. In this paper, to control robots more directly, the AC algorithm, which operate in continuous action space, is employed. To this end, we propose policy gradients with the way to compose reward and punishment, more accurately, their value functions. Instead of merely composing them at a fixed ratio, we theoretically introduce immediate reward and punishment into the policy gradients, as animals do in decision making. In pushing task experiments, whereas the vanilla AC fails to acquire the task due to too much emphasis on safety, the proposed RP-AC successfully acquires the task with the same level of safety.

Original languageEnglish
Title of host publication2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2019
EditorsAmir Aly, Estela Bicho, Sofiane Boucenna, Bruno Castro da Silva, Mohamed Chetouani, Angel P. del Pobil, Julien Diard, Stephane Doncieux, Tilbe Goksun, Angela Grimminger, Frank Guerin, Yoshinobu Hagiwara, Lorenzo Jamone, Sinan Kalkan, Bruno Lara, Clement Moulin-Frier, Shingo Murata, Takayuki Nagai, Yukie Nagai, Iris Nomikou, Masaki Ogino, Pierre-Yves Oudeyer, Alfredo F. Pereira, Alexandre Pitti, Joanna Raczaszek-Leonardi, Sebastian Risi, Benjamin Rosman, Yulia Sandamirskaya, Malte Schilling, Alessandra Sciutti, Patricia Shaw, Andrea Soltoggio, Michael Spranger, Tadahiro Taniguchi, Serge Thill, Jochen Triesch, Emre Ugur, Anna-Lisa Vollmer
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages37-42
Number of pages6
ISBN (Electronic)9781538681282
DOIs
StatePublished - Aug 2019
Externally publishedYes
Event9th Joint IEEE International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2019 - Oslo, Norway
Duration: 19 Aug 201922 Aug 2019

Publication series

Name2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2019

Conference

Conference9th Joint IEEE International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2019
Country/TerritoryNorway
CityOslo
Period19/08/1922/08/19

Fingerprint

Dive into the research topics of 'Reward-Punishment Actor-Critic Algorithm Applying to Robotic Non-grasping Manipulation'. Together they form a unique fingerprint.

Cite this