TY - GEN
T1 - AtlasHDF
T2 - 10th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, BigSpatial 2022
AU - Werner, Martin
AU - Li, Hao
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/11/1
Y1 - 2022/11/1
N2 - The last decade witnesses a fast development in geospatial application of artificial intelligence (GeoAI). However, due to the misalignment with wider computer science progresses, the geospatial community, for a long time, keeps working with powerful and over-sophisticated tools and software, whose functionality goes far beyond the actual basic need of GeoAI tasks. This fact, to a certain extent, hinders our steps towards establishing future sustainable and replicable GeoAI models. In this paper, we aim to address this challenge by introducing an efficient big data framework based on the modern HDF5 technology, called AtlasHDF, in which we designed lossless data mappings (immediate mapping and analysis-ready mapping) from OpenStreetMap (OSM) vector data into a single HDF5 data container to facilitate fast and flexible GeoAI applications learnt from OSM data. Since the HDF5 is included as a default dependency in most GeoAI and high performance computing (HPC) environments, the proposed AtlasHDF provides a cross-platformm and single-techonology solution of handling heterogeneous big geodata for GeoAI. As a case study, we conducted a comparative analysis of the AtlasHDF framework with three commonly-used data formats (i.e., PBF, Shapefile and GeoPackage) using the latest OSM data from the city of Berlin (Germany), then elaborated on the advantages of each data format w.r.t file size, querying, rending, dependency, data extendability. Given a wide range of GeoAI tasks that can potentially benefit from our framework, our future work will focus on extending the framework to heterogeneous big geodata (vector and raster) to support seamless and fast data integration without any geospatial software dependency until the training stage of GeoAI. A reference implementation of the framework developed in this paper is provided to the public at: https://github.com/tumbgd/hdf4water.
AB - The last decade witnesses a fast development in geospatial application of artificial intelligence (GeoAI). However, due to the misalignment with wider computer science progresses, the geospatial community, for a long time, keeps working with powerful and over-sophisticated tools and software, whose functionality goes far beyond the actual basic need of GeoAI tasks. This fact, to a certain extent, hinders our steps towards establishing future sustainable and replicable GeoAI models. In this paper, we aim to address this challenge by introducing an efficient big data framework based on the modern HDF5 technology, called AtlasHDF, in which we designed lossless data mappings (immediate mapping and analysis-ready mapping) from OpenStreetMap (OSM) vector data into a single HDF5 data container to facilitate fast and flexible GeoAI applications learnt from OSM data. Since the HDF5 is included as a default dependency in most GeoAI and high performance computing (HPC) environments, the proposed AtlasHDF provides a cross-platformm and single-techonology solution of handling heterogeneous big geodata for GeoAI. As a case study, we conducted a comparative analysis of the AtlasHDF framework with three commonly-used data formats (i.e., PBF, Shapefile and GeoPackage) using the latest OSM data from the city of Berlin (Germany), then elaborated on the advantages of each data format w.r.t file size, querying, rending, dependency, data extendability. Given a wide range of GeoAI tasks that can potentially benefit from our framework, our future work will focus on extending the framework to heterogeneous big geodata (vector and raster) to support seamless and fast data integration without any geospatial software dependency until the training stage of GeoAI. A reference implementation of the framework developed in this paper is provided to the public at: https://github.com/tumbgd/hdf4water.
KW - GeoAI
KW - OpenStreetMap
KW - big data
KW - hierarchical data format
KW - immediate mapping
UR - http://www.scopus.com/inward/record.url?scp=85142634945&partnerID=8YFLogxK
U2 - 10.1145/3557917.3567615
DO - 10.1145/3557917.3567615
M3 - Conference contribution
AN - SCOPUS:85142634945
T3 - Proceedings of the 10th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, BigSpatial 2022
SP - 1
EP - 7
BT - Proceedings of the 10th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, BigSpatial 2022
A2 - Shashidharan, Ashwin
A2 - Gadiraju, Krishna Karthik
A2 - Chandola, Varun
A2 - Vatsavai, Ranga Raju
PB - Association for Computing Machinery, Inc
Y2 - 1 November 2022
ER -