FuseNet: Incorporating depth into semantic segmentation via fusion-based CNN architecture

Caner Hazirbas, Lingni Ma, Csaba Domokos, Daniel Cremers

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

113 Scopus citations

Abstract

In this paper we address the problem of semantic labeling of indoor scenes on RGB-D data. With the availability of RGB-D cameras, it is expected that additional depth measurement will improve the accuracy. Here we investigate a solution how to incorporate complementary depth information into a semantic segmentation framework by making use of convolutional neural networks (CNNs). Recently encoder-decoder type fully convolutional CNN architectures have achieved a great success in the field of semantic segmentation. Motivated by this observation we propose an encoder-decoder type network, where the encoder part is composed of two branches of networks that simultaneously extract features from RGB and depth images and fuse depth features into the RGB feature maps as the network goes deeper. Comprehensive experimental evaluations demonstrate that the proposed fusion-based architecture achieves competitive results with the state-of-the-art methods on the challenging SUN RGB-D benchmark obtaining 76.27% global accuracy, 48.30% average class accuracy and 37.29% average intersection-over-union score.

Original languageEnglish
Title of host publicationComputer Vision - ACCV 2016 - 13th Asian Conference on Computer Vision, Revised Selected Papers
EditorsYoichi Sato, Ko Nishino, Vincent Lepetit, Shang-Hong Lai
PublisherSpringer Verlag
ISBN (Print)9783319541808
DOIs
StatePublished - 2017
Event13th Asian Conference on Computer Vision, ACCV 2016 - Taipei, Taiwan, Province of China
Duration: 20 Nov 201624 Nov 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10111 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th Asian Conference on Computer Vision, ACCV 2016
Country/TerritoryTaiwan, Province of China
City Taipei
Period20/11/1624/11/16

Fingerprint

Dive into the research topics of 'FuseNet: Incorporating depth into semantic segmentation via fusion-based CNN architecture'. Together they form a unique fingerprint.

Cite this