TY - GEN
T1 - Deep Q-learning for the Control of PLC-based Automated Production Systems
AU - Zinn, Jonas
AU - Vogel-Heuser, Birgit
AU - Ockier, Paulina
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/8
Y1 - 2020/8
N2 - This paper evaluates the use of Deep Reinforcement Learning to control Programmable Logic Controller-based automated Production Systems, which are characterized by multiple end-effectors that are actuated in only one or two axes. Due to the large number of actuators of which only a few affect the processing of a workpiece at a given time, these systems are challenging to learn. In this paper, Deep Q-learning is applied to a small use case focusing on sorting workpieces by color in a simulation of such a production system. The basic algorithm is hereby compared to four commonly used extensions: Double Q-learning, Dueling Networks, Prioritized Experience Replay, and Hindsight Experience Replay. For the scope of this paper, simplifications are applied to the state and action space. While the baseline implementation of Deep Q-learning is able to correctly sort 30 previously seen workpiece combinations, it does not reliably generalize to unseen ones within 45,000 training episodes. In contrast, the algorithm using all four considered extensions is able to reliably generalize to all 81 possible workpiece combinations.
AB - This paper evaluates the use of Deep Reinforcement Learning to control Programmable Logic Controller-based automated Production Systems, which are characterized by multiple end-effectors that are actuated in only one or two axes. Due to the large number of actuators of which only a few affect the processing of a workpiece at a given time, these systems are challenging to learn. In this paper, Deep Q-learning is applied to a small use case focusing on sorting workpieces by color in a simulation of such a production system. The basic algorithm is hereby compared to four commonly used extensions: Double Q-learning, Dueling Networks, Prioritized Experience Replay, and Hindsight Experience Replay. For the scope of this paper, simplifications are applied to the state and action space. While the baseline implementation of Deep Q-learning is able to correctly sort 30 previously seen workpiece combinations, it does not reliably generalize to unseen ones within 45,000 training episodes. In contrast, the algorithm using all four considered extensions is able to reliably generalize to all 81 possible workpiece combinations.
UR - http://www.scopus.com/inward/record.url?scp=85094114770&partnerID=8YFLogxK
U2 - 10.1109/CASE48305.2020.9216863
DO - 10.1109/CASE48305.2020.9216863
M3 - Conference contribution
AN - SCOPUS:85094114770
T3 - IEEE International Conference on Automation Science and Engineering
SP - 1434
EP - 1440
BT - 2020 IEEE 16th International Conference on Automation Science and Engineering, CASE 2020
PB - IEEE Computer Society
T2 - 16th IEEE International Conference on Automation Science and Engineering, CASE 2020
Y2 - 20 August 2020 through 21 August 2020
ER -