A review and quantitative evaluation of direct visual-inertial odometry

Lukas Von Stumberg, Vladyslav Usenko, Daniel Cremers

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

5 Scopus citations

Abstract

Simultaneous location and mapping (SLAM) is an integral part of scene understanding. There have been two sensors that in combination have shown to be particularly useful for this task: cameras and inertial measurement units (IMUs). In the last years a lot of very powerful and precise purely visual SLAM methods have been proposed, yet all of them exhibit a lack of robustness especially to fast motion and textureless areas. This is where IMUs are a great addition. They can provide short-term motion constraints and in contrast to cameras do not contain outliers. In order to complement vision in the most optimal way, IMU data needs to be injected into the vision algorithm on a very deep layer resulting in a tight integration. In this chapter we will review the fundamentals of visual-inertial sensor fusion and explain a current state-of-the-art method called visual-inertial direct sparse odometry (VI-DSO). VI-DSO jointly estimates camera poses and sparse scene geometry by minimizing photometric and IMU measurement errors in a combined energy functional. IMU information is accumulated between several frames using measurement preintegration, and it is inserted into the optimization as an additional constraint. In VI-DSO, scale and gravity direction are explicitly included into the model and jointly optimized together with other variables such as poses. As the scale is often not immediately observable using IMU data this allows one to initialize the visual-inertial system with an arbitrary scale instead of having to delay the initialization until everything is observable. We will also cover partial marginalization of old variables, which keeps the computation time bounded as we explore the environment. We also describe dynamic marginalization, which allows us to use partial marginalization even in cases where the initial scale estimate is far from the optimum. We evaluate our system and compare to other visual-inertial methods on a publicly available dataset.

Original languageEnglish
Title of host publicationMultimodal Scene Understanding
Subtitle of host publicationAlgorithms, Applications and Deep Learning
PublisherElsevier
Pages159-198
Number of pages40
ISBN (Electronic)9780128173589
DOIs
StatePublished - 1 Jan 2019

Keywords

  • Sensor fusion
  • Slam
  • Visual-inertial odometry

Fingerprint

Dive into the research topics of 'A review and quantitative evaluation of direct visual-inertial odometry'. Together they form a unique fingerprint.

Cite this