TY - JOUR
T1 - Parallel Processing Strategies for Big Geospatial Data
AU - Werner, Martin
N1 - Publisher Copyright:
Copyright © 2019 Werner.
PY - 2019/12/3
Y1 - 2019/12/3
N2 - This paper provides an abstract analysis of parallel processing strategies for spatial and spatio-temporal data. It isolates aspects such as data locality and computational locality as well as redundancy and locally sequential access as central elements of parallel algorithm design for spatial data. Furthermore, the paper gives some examples from simple and advanced GIS and spatial data analysis highlighting both that big data systems have been around long before the current hype of big data and that they follow some design principles which are inevitable for spatial data including distributed data structures and messaging, which are, however, incompatible with the popular MapReduce paradigm. Throughout this discussion, the need for a replacement or extension of the MapReduce paradigm for spatial data is derived. This paradigm should be able to deal with the imperfect data locality inherent to spatial data hindering full independence of non-trivial computational tasks. We conclude that more research is needed and that spatial big data systems should pick up more concepts like graphs, shortest paths, raster data, events, and streams at the same time instead of solving exactly the set of spatially separable problems such as line simplifications or range queries in manydifferent ways.
AB - This paper provides an abstract analysis of parallel processing strategies for spatial and spatio-temporal data. It isolates aspects such as data locality and computational locality as well as redundancy and locally sequential access as central elements of parallel algorithm design for spatial data. Furthermore, the paper gives some examples from simple and advanced GIS and spatial data analysis highlighting both that big data systems have been around long before the current hype of big data and that they follow some design principles which are inevitable for spatial data including distributed data structures and messaging, which are, however, incompatible with the popular MapReduce paradigm. Throughout this discussion, the need for a replacement or extension of the MapReduce paradigm for spatial data is derived. This paradigm should be able to deal with the imperfect data locality inherent to spatial data hindering full independence of non-trivial computational tasks. We conclude that more research is needed and that spatial big data systems should pick up more concepts like graphs, shortest paths, raster data, events, and streams at the same time instead of solving exactly the set of spatially separable problems such as line simplifications or range queries in manydifferent ways.
KW - MapReduce
KW - big data
KW - cloud computing
KW - spatial HPC
KW - spatial computing models
UR - http://www.scopus.com/inward/record.url?scp=85117749026&partnerID=8YFLogxK
U2 - 10.3389/fdata.2019.00044
DO - 10.3389/fdata.2019.00044
M3 - Review article
AN - SCOPUS:85117749026
SN - 2624-909X
VL - 2
JO - Frontiers in Big Data
JF - Frontiers in Big Data
M1 - 44
ER -