TY - JOUR
T1 - GloBiMapsAI
T2 - An AI-Enhanced Probabilistic Data Structure for Global Raster Datasets
AU - Werner, Martin
N1 - Publisher Copyright:
© 2021 Association for Computing Machinery.
PY - 2021/12
Y1 - 2021/12
N2 - In the last decade, more and more spatial data has been acquired on a global scale due to satellite missions, social media, and coordinated governmental activities. This observational data suffers from huge storage footprints and makes global analysis challenging. Therefore, many information products have been designed in which observations are turned into global maps showing features such as land cover or land use, often with only a few discrete values and sparse spatial coverage like only within cities. Traditional coding of such data as a raster image becomes challenging due to the sizes of the datasets and spatially non-local access patterns, for example, when labeling social media streams. This article proposes GloBiMap, a randomized data structure, based on Bloom filters, for modeling low-cardinality sparse raster images of excessive sizes in a configurable amount of memory with pure random access operations avoiding costly intermediate decompression. In addition, the data structure is designed to correct the inevitable errors of the randomized layer in order to have a fully exact representation. We show the feasibility of the approach on several real-world datasets including the Global Urban Footprint in which each pixel denotes whether a particular location contains a building at a resolution of roughly 10m globally as well as on a global Twitter sample of more than 220 million precisely geolocated tweets. In addition, we propose the integration of a denoiser engine based on artificial intelligence in order to reduce the amount of error correction information for extremely compressive GloBiMaps.
AB - In the last decade, more and more spatial data has been acquired on a global scale due to satellite missions, social media, and coordinated governmental activities. This observational data suffers from huge storage footprints and makes global analysis challenging. Therefore, many information products have been designed in which observations are turned into global maps showing features such as land cover or land use, often with only a few discrete values and sparse spatial coverage like only within cities. Traditional coding of such data as a raster image becomes challenging due to the sizes of the datasets and spatially non-local access patterns, for example, when labeling social media streams. This article proposes GloBiMap, a randomized data structure, based on Bloom filters, for modeling low-cardinality sparse raster images of excessive sizes in a configurable amount of memory with pure random access operations avoiding costly intermediate decompression. In addition, the data structure is designed to correct the inevitable errors of the randomized layer in order to have a fully exact representation. We show the feasibility of the approach on several real-world datasets including the Global Urban Footprint in which each pixel denotes whether a particular location contains a building at a resolution of roughly 10m globally as well as on a global Twitter sample of more than 220 million precisely geolocated tweets. In addition, we propose the integration of a denoiser engine based on artificial intelligence in order to reduce the amount of error correction information for extremely compressive GloBiMaps.
KW - Image representation
KW - data sparsity and compression
KW - geographic information systems
KW - machine learning
KW - randomized data structures
UR - http://www.scopus.com/inward/record.url?scp=85122593687&partnerID=8YFLogxK
U2 - 10.1145/3453184
DO - 10.1145/3453184
M3 - Article
AN - SCOPUS:85122593687
SN - 2374-0353
VL - 7
JO - ACM Transactions on Spatial Algorithms and Systems
JF - ACM Transactions on Spatial Algorithms and Systems
IS - 4
M1 - 18
ER -