TY - GEN
T1 - GloBiMaps-a probabilistic data structure for in-memory processing of global raster datasets
AU - Werner, Martin
N1 - Publisher Copyright:
© 2019 Copyright held by the owner/author(s).
PY - 2019/11/5
Y1 - 2019/11/5
N2 - In the last decade, more and more spatial data has been acquired on a global scale due to satellite missions, social media, and coordinated governmental activities. This observational data suffers from huge storage footprints and makes global analysis challenging. Therefore, many information products have been designed in which observations are turned into global maps showing features such as land cover or land use, often with only a few discrete values and sparse spatial coverage like only within cities. Traditional coding of such data as a raster image becomes challenging due to the sizes of the datasets and spatially non-local access patterns, for example, when labeling social media streams. This paper proposes GloBiMap, a randomized data structure, based on Bloom filters, for modeling low-cardinality sparse raster images of excessive sizes in a configurable amount of memory with pure random access operations avoiding costly intermediate decompression. In addition, the data structure is designed to correct the inevitable errors of the randomized layer in order to have a fully exact representation. We show the feasibility of the approach on several real-world data sets including the Global Urban Footprint in which each pixel denotes whether a particular location contains a building at a resolution of roughly 10cm globally as well as on a global Twitter sample of more than 220 million precisely geolocated tweets.
AB - In the last decade, more and more spatial data has been acquired on a global scale due to satellite missions, social media, and coordinated governmental activities. This observational data suffers from huge storage footprints and makes global analysis challenging. Therefore, many information products have been designed in which observations are turned into global maps showing features such as land cover or land use, often with only a few discrete values and sparse spatial coverage like only within cities. Traditional coding of such data as a raster image becomes challenging due to the sizes of the datasets and spatially non-local access patterns, for example, when labeling social media streams. This paper proposes GloBiMap, a randomized data structure, based on Bloom filters, for modeling low-cardinality sparse raster images of excessive sizes in a configurable amount of memory with pure random access operations avoiding costly intermediate decompression. In addition, the data structure is designed to correct the inevitable errors of the randomized layer in order to have a fully exact representation. We show the feasibility of the approach on several real-world data sets including the Global Urban Footprint in which each pixel denotes whether a particular location contains a building at a resolution of roughly 10cm globally as well as on a global Twitter sample of more than 220 million precisely geolocated tweets.
KW - Data Sparsity and Compression
KW - Geographic Information Systems
KW - Image Representation
KW - Randomized Data Structures
UR - http://www.scopus.com/inward/record.url?scp=85076945145&partnerID=8YFLogxK
U2 - 10.1145/3347146.3359086
DO - 10.1145/3347146.3359086
M3 - Conference contribution
AN - SCOPUS:85076945145
T3 - GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems
SP - 3
EP - 12
BT - 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2019
A2 - Banaei-Kashani, Farnoush
A2 - Trajcevski, Goce
A2 - Guting, Ralf Hartmut
A2 - Kulik, Lars
A2 - Newsam, Shawn
PB - Association for Computing Machinery
T2 - 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2019
Y2 - 5 November 2019 through 8 November 2019
ER -