A Comparison of Value-based and Policy-based Reinforcement Learning for Monitoring-informed Railway Maintenance Planning

Giacomo Arcieri, Cyprien Hoelzl, Oliver Schwery, Daniel Straub, Konstantinos G. Papakonstantinou, Eleni Chatzi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Optimal maintenance planning for railway infrastructure and assets forms a complex sequential decision-making problem. Railways are naturally subject to deterioration, which can result in compromised service and increased safety risks and costs. Maintenance actions ought to be proactively planned to prevent the adverse effects of deterioration and the associated costs. Such predictive actions can be planned based on monitoring data, which are often indirect and noisy, thus offering an uncertain assessment of the railway condition. From a mathematical perspective, this forms a stochastic control problem under data uncertainty, which can be cast as a Partially Observable Markov Decision Process (POMDP). In this work, we model the real-world problem of railway optimal maintenance planning as a POMDP, with the problem parameters inferred from real-world monitoring data. The POMDP model serves to infer beliefs over a set of hidden states, which aim to capture the evolution of the underlying deterioration process. The maintenance optimization problem is here ultimately solved via the use of deep Reinforcement Learning (RL) techniques, which allow for a more flexible and broad search over the policy space when compared to classical POMDP solution algorithms. A comparison of value-based and policy-based RL methods is also offered, which exploit deep learning architectures to model either action-value functions (i.e., the expected returns from an action-state pair) or directly the policy. Our work shows how this complex planning problem can be effectively solved via deep RL to derive an optimized maintenance policy of railway tracks, demonstrated on real-world monitoring data, and offers insights into the solution provided by different classes of RL algorithms.

Original languageEnglish
Title of host publicationStructural Health Monitoring 2023
Subtitle of host publicationDesigning SHM for Sustainability, Maintainability, and Reliability - Proceedings of the 14th International Workshop on Structural Health Monitoring
EditorsSaman Farhangdoust, Alfredo Guemes, Fu-Kuo Chang
PublisherDEStech Publications
Pages2449-2459
Number of pages11
ISBN (Electronic)9781605956930
StatePublished - 2023
Event14th International Workshop on Structural Health Monitoring: Designing SHM for Sustainability, Maintainability, and Reliability, IWSHM 2023 - Stanford, United States
Duration: 12 Sep 202314 Sep 2023

Publication series

NameStructural Health Monitoring 2023: Designing SHM for Sustainability, Maintainability, and Reliability - Proceedings of the 14th International Workshop on Structural Health Monitoring

Conference

Conference14th International Workshop on Structural Health Monitoring: Designing SHM for Sustainability, Maintainability, and Reliability, IWSHM 2023
Country/TerritoryUnited States
CityStanford
Period12/09/2314/09/23

Fingerprint

Dive into the research topics of 'A Comparison of Value-based and Policy-based Reinforcement Learning for Monitoring-informed Railway Maintenance Planning'. Together they form a unique fingerprint.

Cite this