TY - GEN
T1 - A Comparison of Value-based and Policy-based Reinforcement Learning for Monitoring-informed Railway Maintenance Planning
AU - Arcieri, Giacomo
AU - Hoelzl, Cyprien
AU - Schwery, Oliver
AU - Straub, Daniel
AU - Papakonstantinou, Konstantinos G.
AU - Chatzi, Eleni
N1 - Publisher Copyright:
© 2023 by DEStech Publi cations, Inc. All rights reserved
PY - 2023
Y1 - 2023
N2 - Optimal maintenance planning for railway infrastructure and assets forms a complex sequential decision-making problem. Railways are naturally subject to deterioration, which can result in compromised service and increased safety risks and costs. Maintenance actions ought to be proactively planned to prevent the adverse effects of deterioration and the associated costs. Such predictive actions can be planned based on monitoring data, which are often indirect and noisy, thus offering an uncertain assessment of the railway condition. From a mathematical perspective, this forms a stochastic control problem under data uncertainty, which can be cast as a Partially Observable Markov Decision Process (POMDP). In this work, we model the real-world problem of railway optimal maintenance planning as a POMDP, with the problem parameters inferred from real-world monitoring data. The POMDP model serves to infer beliefs over a set of hidden states, which aim to capture the evolution of the underlying deterioration process. The maintenance optimization problem is here ultimately solved via the use of deep Reinforcement Learning (RL) techniques, which allow for a more flexible and broad search over the policy space when compared to classical POMDP solution algorithms. A comparison of value-based and policy-based RL methods is also offered, which exploit deep learning architectures to model either action-value functions (i.e., the expected returns from an action-state pair) or directly the policy. Our work shows how this complex planning problem can be effectively solved via deep RL to derive an optimized maintenance policy of railway tracks, demonstrated on real-world monitoring data, and offers insights into the solution provided by different classes of RL algorithms.
AB - Optimal maintenance planning for railway infrastructure and assets forms a complex sequential decision-making problem. Railways are naturally subject to deterioration, which can result in compromised service and increased safety risks and costs. Maintenance actions ought to be proactively planned to prevent the adverse effects of deterioration and the associated costs. Such predictive actions can be planned based on monitoring data, which are often indirect and noisy, thus offering an uncertain assessment of the railway condition. From a mathematical perspective, this forms a stochastic control problem under data uncertainty, which can be cast as a Partially Observable Markov Decision Process (POMDP). In this work, we model the real-world problem of railway optimal maintenance planning as a POMDP, with the problem parameters inferred from real-world monitoring data. The POMDP model serves to infer beliefs over a set of hidden states, which aim to capture the evolution of the underlying deterioration process. The maintenance optimization problem is here ultimately solved via the use of deep Reinforcement Learning (RL) techniques, which allow for a more flexible and broad search over the policy space when compared to classical POMDP solution algorithms. A comparison of value-based and policy-based RL methods is also offered, which exploit deep learning architectures to model either action-value functions (i.e., the expected returns from an action-state pair) or directly the policy. Our work shows how this complex planning problem can be effectively solved via deep RL to derive an optimized maintenance policy of railway tracks, demonstrated on real-world monitoring data, and offers insights into the solution provided by different classes of RL algorithms.
UR - http://www.scopus.com/inward/record.url?scp=85182256433&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85182256433
T3 - Structural Health Monitoring 2023: Designing SHM for Sustainability, Maintainability, and Reliability - Proceedings of the 14th International Workshop on Structural Health Monitoring
SP - 2449
EP - 2459
BT - Structural Health Monitoring 2023
A2 - Farhangdoust, Saman
A2 - Guemes, Alfredo
A2 - Chang, Fu-Kuo
PB - DEStech Publications
T2 - 14th International Workshop on Structural Health Monitoring: Designing SHM for Sustainability, Maintainability, and Reliability, IWSHM 2023
Y2 - 12 September 2023 through 14 September 2023
ER -