TY - GEN
T1 - Parameter Sharing Reinforcement Learning for Modeling Multi-Agent Driving Behavior in Roundabout Scenarios
AU - Konstantinidis, Fabian
AU - Hofmann, Ulrich
AU - Sackmann, Moritz
AU - Thielecke, Jorn
AU - De Candido, Oliver
AU - Utschick, Wolfgang
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/9/19
Y1 - 2021/9/19
N2 - Modeling other drivers' behavior in highly interactive traffic situations, such as roundabouts, is a challenging task. We address this task using a Multi-Agent Reinforcement Learning (MARL) approach that learns a driving policy based on a minimal set of assumptions: drivers want to move forward and avoid collisions while maintaining low accelerations. Each agent's actions depend only on his observation of the local environment; no explicit communication between agents is possible. In order to teach the agents to safely interact with each other, and for example, respect right-of-way rules, we use parameter sharing: During training all vehicles are controlled by the same policy and the aggregated experiences are used to improve the policy. Moreover, parameter sharing enables us to use the efficient Soft Actor Critic (SAC) algorithm for training. The approach is evaluated in a roundabout setting with different traffic densities. Furthermore, the ability of the model to generalize is assessed in an untrained roundabout. In both settings, success rates above 97 % demonstrate that a safe and transferable policy is learned.
AB - Modeling other drivers' behavior in highly interactive traffic situations, such as roundabouts, is a challenging task. We address this task using a Multi-Agent Reinforcement Learning (MARL) approach that learns a driving policy based on a minimal set of assumptions: drivers want to move forward and avoid collisions while maintaining low accelerations. Each agent's actions depend only on his observation of the local environment; no explicit communication between agents is possible. In order to teach the agents to safely interact with each other, and for example, respect right-of-way rules, we use parameter sharing: During training all vehicles are controlled by the same policy and the aggregated experiences are used to improve the policy. Moreover, parameter sharing enables us to use the efficient Soft Actor Critic (SAC) algorithm for training. The approach is evaluated in a roundabout setting with different traffic densities. Furthermore, the ability of the model to generalize is assessed in an untrained roundabout. In both settings, success rates above 97 % demonstrate that a safe and transferable policy is learned.
UR - http://www.scopus.com/inward/record.url?scp=85118421664&partnerID=8YFLogxK
U2 - 10.1109/ITSC48978.2021.9565031
DO - 10.1109/ITSC48978.2021.9565031
M3 - Conference contribution
AN - SCOPUS:85118421664
T3 - IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC
SP - 1974
EP - 1981
BT - 2021 IEEE International Intelligent Transportation Systems Conference, ITSC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Intelligent Transportation Systems Conference, ITSC 2021
Y2 - 19 September 2021 through 22 September 2021
ER -