Skip to main navigation Skip to search Skip to main content

Global Rewards in Multi-Agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems

  • Technical University of Munich
  • École Polytechnique de Montréal

Research output: Contribution to journalConference articlepeer-review

6 Scopus citations

Abstract

We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. An extended version of our paper, including an appendix, can be found at https://arxiv.org/abs/2312.08884. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.

Original languageEnglish
Pages (from-to)260-272
Number of pages13
JournalProceedings of Machine Learning Research
Volume242
StatePublished - 2024
Event6th Annual Learning for Dynamics and Control Conference, L4DC 2024 - Oxford, United Kingdom
Duration: 15 Jul 202417 Jul 2024

Keywords

  • autonomous mobility on demand
  • credit assignment
  • deep reinforcement learning
  • multi-agent learning

Fingerprint

Dive into the research topics of 'Global Rewards in Multi-Agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems'. Together they form a unique fingerprint.

Cite this