TY - JOUR
T1 - Towards a Methodology for Production Scheduling Using Reinforcement Learning Under Consideration of a Company's Individual Tasks and Goals
AU - Wegmann, Marc
AU - Zaeh, Michael F.
N1 - Publisher Copyright:
© 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of the scientific committee of the 56th CIRP International Conference on Manufacturing Systems 2023.
PY - 2023
Y1 - 2023
N2 - The impact of the covid pandemic and the chip crisis on industrial production are just a few examples that emphasize the complex and volatile environment production systems must cope with. Especially operational scheduling tasks intensely suffer from these influences due to increased complexity in decision-making as well as frequent rescheduling activities. At the same time, they highly affect production-related key performance indicators such as lead time or resource utilization. As several publications have already shown, applying innovative and data-based methods from the field of Reinforcement Learning (RL) to complex scheduling tasks provides great potential to handle the challenges arising from a complex and volatile environment. The building blocks of an RL approach strongly depend on a company's individual task specifications and optimization goals. Yet, a methodology that considers these specifications for the design of RL approaches in production scheduling has not been introduced, preventing the transfer from laboratory examples to wide-ranging industry applications. To address this research gap, this paper aims to provide a conceptual methodology to generate RL solutions depending on the scheduling tasks and objectives. The methodology proposed in this paper consists of four central modules that constitute the building blocks of an RL solution. The first module derives the action space for the RL approach based on the underlying scheduling tasks. The second module constructs the reward function of the RL approach based on a company's individual scheduling targets. The third module derives the state vector from the components of the reward function. The last module selects an appropriate optimization algorithm for the RL approach and merges the previous modules to learn an optimal scheduling policy in a simulation environment that can be applied to real-world problems. As a result, the application of RL-based scheduling enables production systems to meet current requirements and evolve into resilient and self-optimizing systems.
AB - The impact of the covid pandemic and the chip crisis on industrial production are just a few examples that emphasize the complex and volatile environment production systems must cope with. Especially operational scheduling tasks intensely suffer from these influences due to increased complexity in decision-making as well as frequent rescheduling activities. At the same time, they highly affect production-related key performance indicators such as lead time or resource utilization. As several publications have already shown, applying innovative and data-based methods from the field of Reinforcement Learning (RL) to complex scheduling tasks provides great potential to handle the challenges arising from a complex and volatile environment. The building blocks of an RL approach strongly depend on a company's individual task specifications and optimization goals. Yet, a methodology that considers these specifications for the design of RL approaches in production scheduling has not been introduced, preventing the transfer from laboratory examples to wide-ranging industry applications. To address this research gap, this paper aims to provide a conceptual methodology to generate RL solutions depending on the scheduling tasks and objectives. The methodology proposed in this paper consists of four central modules that constitute the building blocks of an RL solution. The first module derives the action space for the RL approach based on the underlying scheduling tasks. The second module constructs the reward function of the RL approach based on a company's individual scheduling targets. The third module derives the state vector from the components of the reward function. The last module selects an appropriate optimization algorithm for the RL approach and merges the previous modules to learn an optimal scheduling policy in a simulation environment that can be applied to real-world problems. As a result, the application of RL-based scheduling enables production systems to meet current requirements and evolve into resilient and self-optimizing systems.
KW - Artificial Intelligence
KW - Order Processing
KW - Production Scheduling
KW - Reinforcement Learning
KW - Resilience
UR - http://www.scopus.com/inward/record.url?scp=85184568598&partnerID=8YFLogxK
U2 - 10.1016/j.procir.2023.09.012
DO - 10.1016/j.procir.2023.09.012
M3 - Conference article
AN - SCOPUS:85184568598
SN - 2212-8271
VL - 120
SP - 416
EP - 421
JO - Procedia CIRP
JF - Procedia CIRP
T2 - 56th CIRP International Conference on Manufacturing Systems, CIRP CMS 2023
Y2 - 24 October 2023 through 26 October 2023
ER -