TY - GEN
T1 - Reward-Punishment Actor-Critic Algorithm Applying to Robotic Non-grasping Manipulation
AU - Kobayashi, Taisuke
AU - Aotani, Takumi
AU - Guadarrama-Olvera, Julio Rogelio
AU - Dean-Leon, Emmanuel
AU - Cheng, Gordon
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/8
Y1 - 2019/8
N2 - This paper presents a new actor-critic (AC) algorithm based on a biological reward-punishment framework for reinforcement learning (RL), named 'RP-AC'. RL can yields capabilities where robots can take over complicated and dangerous tasks instead of human. Such capability, however, may require accounting for safety when achieving the tasks, because dangerous states and actions could cause the robot to breakdown and induce injuries. In the reward-punishment framework, robots gain both positive reward (called reward) as the degree of task achievement and negative reward (called punishment) as the risk of current state and action, which are composed in RL algorithms. In this paper, to control robots more directly, the AC algorithm, which operate in continuous action space, is employed. To this end, we propose policy gradients with the way to compose reward and punishment, more accurately, their value functions. Instead of merely composing them at a fixed ratio, we theoretically introduce immediate reward and punishment into the policy gradients, as animals do in decision making. In pushing task experiments, whereas the vanilla AC fails to acquire the task due to too much emphasis on safety, the proposed RP-AC successfully acquires the task with the same level of safety.
AB - This paper presents a new actor-critic (AC) algorithm based on a biological reward-punishment framework for reinforcement learning (RL), named 'RP-AC'. RL can yields capabilities where robots can take over complicated and dangerous tasks instead of human. Such capability, however, may require accounting for safety when achieving the tasks, because dangerous states and actions could cause the robot to breakdown and induce injuries. In the reward-punishment framework, robots gain both positive reward (called reward) as the degree of task achievement and negative reward (called punishment) as the risk of current state and action, which are composed in RL algorithms. In this paper, to control robots more directly, the AC algorithm, which operate in continuous action space, is employed. To this end, we propose policy gradients with the way to compose reward and punishment, more accurately, their value functions. Instead of merely composing them at a fixed ratio, we theoretically introduce immediate reward and punishment into the policy gradients, as animals do in decision making. In pushing task experiments, whereas the vanilla AC fails to acquire the task due to too much emphasis on safety, the proposed RP-AC successfully acquires the task with the same level of safety.
UR - http://www.scopus.com/inward/record.url?scp=85073689528&partnerID=8YFLogxK
U2 - 10.1109/DEVLRN.2019.8850699
DO - 10.1109/DEVLRN.2019.8850699
M3 - Conference contribution
AN - SCOPUS:85073689528
T3 - 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2019
SP - 37
EP - 42
BT - 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2019
A2 - Aly, Amir
A2 - Bicho, Estela
A2 - Boucenna, Sofiane
A2 - Castro da Silva, Bruno
A2 - Chetouani, Mohamed
A2 - del Pobil, Angel P.
A2 - Diard, Julien
A2 - Doncieux, Stephane
A2 - Goksun, Tilbe
A2 - Grimminger, Angela
A2 - Guerin, Frank
A2 - Hagiwara, Yoshinobu
A2 - Jamone, Lorenzo
A2 - Kalkan, Sinan
A2 - Lara, Bruno
A2 - Moulin-Frier, Clement
A2 - Murata, Shingo
A2 - Nagai, Takayuki
A2 - Nagai, Yukie
A2 - Nomikou, Iris
A2 - Ogino, Masaki
A2 - Oudeyer, Pierre-Yves
A2 - Pereira, Alfredo F.
A2 - Pitti, Alexandre
A2 - Raczaszek-Leonardi, Joanna
A2 - Risi, Sebastian
A2 - Rosman, Benjamin
A2 - Sandamirskaya, Yulia
A2 - Schilling, Malte
A2 - Sciutti, Alessandra
A2 - Shaw, Patricia
A2 - Soltoggio, Andrea
A2 - Spranger, Michael
A2 - Taniguchi, Tadahiro
A2 - Thill, Serge
A2 - Triesch, Jochen
A2 - Ugur, Emre
A2 - Vollmer, Anna-Lisa
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th Joint IEEE International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2019
Y2 - 19 August 2019 through 22 August 2019
ER -