SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation?

John McCormac, Ankur Handa, Stefan Leutenegger, Andrew J. Davison

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

214 Scopus citations

Abstract

We introduce SceneNet RGB-D, a dataset providing pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection. It also provides perfect camera poses and depth data, allowing investigation into geometric computer vision problems such as optical flow, camera pose estimation, and 3D scene labelling tasks. Random sampling permits virtually unlimited scene configurations, and here we provide 5M rendered RGB-D images from 16K randomly generated 3D trajectories in synthetic layouts, with random but physically simulated object configurations. We compare the semantic segmentation performance of network weights produced from pretraining on RGB images from our dataset against generic VGG-16 ImageNet weights. After fine-tuning on the SUN RGB-D and NYUv2 real-world datasets we find in both cases that the synthetically pre-trained network outperforms the VGG-16 weights. When synthetic pre-training includes a depth channel (something ImageNet cannot natively provide) the performance is greater still. This suggests that large-scale high-quality synthetic RGB datasets with task-specific labels can be more useful for pretraining than real-world generic pre-training such as ImageNet. We host the dataset at http://robotvault. bitbucket.io/scenenet-rgbd.html.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2697-2706
Number of pages10
ISBN (Electronic)9781538610329
DOIs
StatePublished - 22 Dec 2017
Externally publishedYes
Event16th IEEE International Conference on Computer Vision, ICCV 2017 - Venice, Italy
Duration: 22 Oct 201729 Oct 2017

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2017-October
ISSN (Print)1550-5499

Conference

Conference16th IEEE International Conference on Computer Vision, ICCV 2017
Country/TerritoryItaly
CityVenice
Period22/10/1729/10/17

Fingerprint

Dive into the research topics of 'SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation?'. Together they form a unique fingerprint.

Cite this