Vision Language Models in Autonomous Driving: A Survey and Outlook

Xingcheng Zhou, Mingyu Liu, Ekim Yurtsever, Bare Luka Zagar, Walter Zimmer, Hu Cao, Alois C. Knoll

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

The applications of Vision-Language Models (VLMs) in the field of Autonomous Driving (AD) have attracted widespread attention due to their outstanding performance and the ability to leverage Large Language Models (LLMs). By integrating language data, the driving systems can be able to deeply understand real-world environments, improving driving safety and efficiency. In this work, we present a comprehensive and systematic survey of the advances in language models in this domain, encompassing perception and understanding, navigation and planning, decision-making and control, end-to-end autonomous driving, and data generation. We introduce the mainstream VLM tasks and the commonly utilized metrics. Additionally, we review current studies and applications in various areas and summarize the existing language-enhanced autonomous driving dataset thoroughly. At last, we discuss the benefits and challenges of VLMs in AD, and provide researchers with the current research gaps and future trends. <uri>https://github.com/ge25nab/Awesome-VLM-AD-ITS</uri>

Original languageEnglish
Pages (from-to)1-20
Number of pages20
JournalIEEE Transactions on Intelligent Vehicles
DOIs
StateAccepted/In press - 2024

Keywords

  • Autonomous Driving
  • Autonomous vehicles
  • Computational modeling
  • Conditional Data Generation
  • Data models
  • Decision Making
  • End-to-End Autonomous Driving
  • Intelligent Vehicle
  • Language-guided Navigation
  • Large Language Model
  • Planning
  • Surveys
  • Task analysis
  • Vision Language Model
  • Visualization

Fingerprint

Dive into the research topics of 'Vision Language Models in Autonomous Driving: A Survey and Outlook'. Together they form a unique fingerprint.

Cite this