Abstract
We propose an online RGB-D based scene understanding method for indoor scenes running in real time on mobile devices. First, we incrementally reconstruct the scene via simultaneous localization and mapping and compute a three-dimensional (3-D) geometric segmentation by fusing segments obtained from each input depth image in a global 3-D model. We combine this geometric segmentation with semantic annotations to obtain a semantic segmentation in form of a semantic map. To accomplish efficient semantic segmentation, we encode the segments in the global model with a fast incremental 3-D descriptor and use a random forest to determine its semantic label. The predictions from successive frames are then fused to obtain a confident semantic class across time. As a result, the overall method achieves an accuracy that gets close to the most state-of-the-art 3-D scene understanding methods while being much more efficient, enabling real-time execution on low-power embedded systems.
Original language | English |
---|---|
Article number | 8403286 |
Pages (from-to) | 3402-3409 |
Number of pages | 8 |
Journal | IEEE Robotics and Automation Letters |
Volume | 3 |
Issue number | 4 |
DOIs | |
State | Published - Oct 2018 |
Keywords
- RGB-D perception
- SLAM
- Semantic scene understanding