TY - GEN
T1 - Lane-Merging Using Policy-based Reinforcement Learning and Post-Optimization
AU - Hart, Patrick
AU - Rychly, Leonard
AU - Knoll, Alois
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Many current behavior generation methods struggle to handle real-world traffic situations as they do not scale well with complexity. However, behaviors can be learned off-line using data-driven approaches. Especially, reinforcement learning is promising as it implicitly learns how to behave utilizing collected experiences. In this work, we combine policy-based reinforcement learning with local optimization to foster and synthesize the best of the two methodologies. The policy-based reinforcement learning algorithm provides an initial solution and guiding reference for the post-optimization. Therefore, the optimizer only has to compute a single homotopy class, e.g. drive behind or in front of the other vehicle. By storing the state-history during reinforcement learning, it can be used for constraint checking and the optimizer can account for interactions. The post-optimization additionally acts as a safety-layer and the novel method, thus, can be applied in safety-critical applications. We evaluate the proposed method using lane-change scenarios with a varying number of vehicles.
AB - Many current behavior generation methods struggle to handle real-world traffic situations as they do not scale well with complexity. However, behaviors can be learned off-line using data-driven approaches. Especially, reinforcement learning is promising as it implicitly learns how to behave utilizing collected experiences. In this work, we combine policy-based reinforcement learning with local optimization to foster and synthesize the best of the two methodologies. The policy-based reinforcement learning algorithm provides an initial solution and guiding reference for the post-optimization. Therefore, the optimizer only has to compute a single homotopy class, e.g. drive behind or in front of the other vehicle. By storing the state-history during reinforcement learning, it can be used for constraint checking and the optimizer can account for interactions. The post-optimization additionally acts as a safety-layer and the novel method, thus, can be applied in safety-critical applications. We evaluate the proposed method using lane-change scenarios with a varying number of vehicles.
UR - http://www.scopus.com/inward/record.url?scp=85076822358&partnerID=8YFLogxK
U2 - 10.1109/ITSC.2019.8917002
DO - 10.1109/ITSC.2019.8917002
M3 - Conference contribution
AN - SCOPUS:85076822358
T3 - 2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019
SP - 3176
EP - 3181
BT - 2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019
Y2 - 27 October 2019 through 30 October 2019
ER -