Unsupervised Single-Scene Semantic Segmentation for Earth Observation

Sudipan Saha, Muhammad Shahzad, Lichao Mou, Qian Song, Xiao Xiang Zhu

Research output: Contribution to journalArticlepeer-review

14 Scopus citations


Earth observation data have huge potential to enrich our knowledge about our planet. An important step in many Earth observation tasks is semantic segmentation. Generally, a large number of pixelwise labeled images are required to train deep models for supervised semantic segmentation. On the contrary, strong intersensor and geographic variations impede the availability of annotated training data in Earth observation. In practice, most Earth observation tasks use only the target scene without assuming availability of any additional scene, labeled or unlabeled. Keeping in mind such constraints, we propose a semantic segmentation method that learns to segment from a single scene, without using any annotation. Earth observation scenes are generally larger than those encountered in typical computer vision datasets. Exploiting this, the proposed method samples smaller unlabeled patches from the scene. For each patch, an alternate view is generated by simple transformations, e.g., addition of noise. Both views are then processed through a two-stream network and weights are iteratively refined using deep clustering, spatial consistency, and contrastive learning in the pixel space. The proposed model automatically segregates the major classes present in the scene and produces the segmentation map. Extensive experiments on four Earth observation datasets collected by different sensors show the effectiveness of the proposed method. Implementation is available at https://gitlab.lrz.de/ai4eo/cd/-/tree/main/unsupContrastiveSemanticSeg.

Original languageEnglish
Article number5228011
JournalIEEE Transactions on Geoscience and Remote Sensing
StatePublished - 2022


  • Deep learning
  • self-supervised learning
  • semantic segmentation
  • single-scene training


Dive into the research topics of 'Unsupervised Single-Scene Semantic Segmentation for Earth Observation'. Together they form a unique fingerprint.

Cite this