TY - JOUR
T1 - Deep Reinforcement Learning-based Multi-Objective Scheduling for Distributed Heterogeneous Hybrid Flow Shops with Blocking Constraints
AU - Sun, Xueyan
AU - Shen, Weiming
AU - Fan, Jiaxin
AU - Vogel-Heuser, Birgit
AU - Bi, Fandi
AU - Zhang, Chunjiang
N1 - Publisher Copyright:
© 2024 THE AUTHORS
PY - 2025
Y1 - 2025
N2 - This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem (DHHBFSP) designed to minimize the total tardiness and total energy consumption simultaneously, and proposes an improved proximal policy optimization (IPPO) method to make real-time decisions for the DHHBFSP. A multi-objective Markov decision process is modeled for the DHHBFSP, where the reward function is represented by a vector with dynamic weights instead of the common objective-related scalar value. A factory agent (FA) is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality. Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop. A two-stage training strategy is introduced in the IPPO, which learns from both single- and dual-policy data for better data utilization. The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization (PPO), dispatch rules, multi-objective metaheuristics, and multi-agent reinforcement learning methods. Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO, and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality.
AB - This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem (DHHBFSP) designed to minimize the total tardiness and total energy consumption simultaneously, and proposes an improved proximal policy optimization (IPPO) method to make real-time decisions for the DHHBFSP. A multi-objective Markov decision process is modeled for the DHHBFSP, where the reward function is represented by a vector with dynamic weights instead of the common objective-related scalar value. A factory agent (FA) is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality. Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop. A two-stage training strategy is introduced in the IPPO, which learns from both single- and dual-policy data for better data utilization. The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization (PPO), dispatch rules, multi-objective metaheuristics, and multi-agent reinforcement learning methods. Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO, and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality.
KW - Blocking constraints
KW - Distributed hybrid flow-shop scheduling
KW - Multi-agent deep reinforcement learning
KW - Multi-objective Markov decision process
KW - Proximal policy optimization
UR - http://www.scopus.com/inward/record.url?scp=85217901613&partnerID=8YFLogxK
U2 - 10.1016/j.eng.2024.11.033
DO - 10.1016/j.eng.2024.11.033
M3 - Article
AN - SCOPUS:85217901613
SN - 2095-8099
JO - Engineering
JF - Engineering
ER -