Abstract
This article addresses the question of mapping building functions jointly using both aerial and street view images via deep learning techniques. One of the central challenges here is determining a data fusion strategy that can cope with heterogeneous image modalities. We demonstrate that geometric combinations of the features of such two types of images, especially in an early stage of the convolutional layers, often lead to a destructive effect due to the spatial misalignment of the features. Therefore, we address this problem through a decision-level fusion of a diverse ensemble of models trained from each image type independently. In this way, the significant differences in appearance of aerial and street view images are taken into account. Compared to the common multi-stream end-to-end fusion approaches proposed in the literature, we are able to increase the precision scores from 68% to 76%. Another challenge is that sophisticated classification schemes needed for real applications are highly overlapping and not very well defined without sharp boundaries. As a consequence, classification using machine learning becomes significantly harder. In this work, we choose a highly compact classification scheme with four classes, commercial, residential, public, and industrial because such a classification has a very high value to urban geography being correlated with socio-demographic parameters such as population density and income.
Originalsprache | Englisch |
---|---|
Aufsatznummer | 1259 |
Fachzeitschrift | Remote Sensing |
Jahrgang | 11 |
Ausgabenummer | 11 |
DOIs | |
Publikationsstatus | Veröffentlicht - 1 Juni 2019 |