Abstract
In this paper, we present a new task: referring image segmentation for remote sensing data, which targets segmenting out specific objects referred to by natural language. Due to the absence of a dataset for this task, we construct a dataset based on the SkyScapes dataset. Our dataset is designed with linguistically structured expressions that focus on object categories, attributes, and spatial relationships, enabling the generation of binary masks from semantic segmentation maps. To benchmark this task, we evaluate and compare the performance of three different convolutional neural network (CNN)-based methods and a Transformer-based method. Experimental results provide valuable insights into the adaptability of these methods to remote sensing data, highlighting the potential of our dataset as a resource for the remote sensing community to further explore vision-language tasks.
Original language | English |
---|---|
Pages | 946-949 |
Number of pages | 4 |
DOIs | |
State | Published - 2024 |
Event | 2024 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2024 - Athens, Greece Duration: 7 Jul 2024 → 12 Jul 2024 |
Conference
Conference | 2024 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2024 |
---|---|
Country/Territory | Greece |
City | Athens |
Period | 7/07/24 → 12/07/24 |
Keywords
- Referring image segmentation
- remote sensing
- vision-language task