TY - GEN
T1 - Spatial data locality in scalable and fault-tolerant distributed spatial computing systems
AU - Werner, Martin
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/11/6
Y1 - 2018/11/6
N2 - In the last decade, spatial datasets started to grow from small collections of high quality geospatial information into huge collections of data covering the whole planet with varying formats and qualities. Large-scale spatial datasets are about to create significant value in varying application fields including navigation, autonomous driving, urban geography, agriculture, and climate research. Therefore, large datasets are actively acquired. In addition, social networks such as Facebook, Twitter, and Flickr provide text, video, and images with associated geospatial information from the crowd. These sources are highly interesting as they provide near-realtime insights into aspects of human behavior and dynamics. Finally, global and long-running satellite missions such as Landsat, Sentinel, World- View, or TerraSAR add large amounts of geospatial information. It is a matter of fact that these data collections are putting challenges to the computational infrastructure used for spatial computing. Not only do we need a lot of computation, we also need to think about how to organize and design distributed systems that can help tackle the volume, velocity, and variety of current and future geospatial datasets. Modern big data systems employ data replication for two main reasons: first, for increased fault tolerance, and, second, for higher flexibility in scheduling tasks across a large cluster of machines. This paper proposes and compares novel data replication schemata for scalable spatial computing and analyzes the impact on the communication complexity of global spatial joins of a large collection of tweets collected from the Twitter API and building polygons extracted from OpenStreetMap.
AB - In the last decade, spatial datasets started to grow from small collections of high quality geospatial information into huge collections of data covering the whole planet with varying formats and qualities. Large-scale spatial datasets are about to create significant value in varying application fields including navigation, autonomous driving, urban geography, agriculture, and climate research. Therefore, large datasets are actively acquired. In addition, social networks such as Facebook, Twitter, and Flickr provide text, video, and images with associated geospatial information from the crowd. These sources are highly interesting as they provide near-realtime insights into aspects of human behavior and dynamics. Finally, global and long-running satellite missions such as Landsat, Sentinel, World- View, or TerraSAR add large amounts of geospatial information. It is a matter of fact that these data collections are putting challenges to the computational infrastructure used for spatial computing. Not only do we need a lot of computation, we also need to think about how to organize and design distributed systems that can help tackle the volume, velocity, and variety of current and future geospatial datasets. Modern big data systems employ data replication for two main reasons: first, for increased fault tolerance, and, second, for higher flexibility in scheduling tasks across a large cluster of machines. This paper proposes and compares novel data replication schemata for scalable spatial computing and analyzes the impact on the communication complexity of global spatial joins of a large collection of tweets collected from the Twitter API and building polygons extracted from OpenStreetMap.
KW - Data Replication and Distribution; Spatial Join
KW - Spatial Big Data
UR - http://www.scopus.com/inward/record.url?scp=85060582621&partnerID=8YFLogxK
U2 - 10.1145/3282834.3282837
DO - 10.1145/3282834.3282837
M3 - Conference contribution
AN - SCOPUS:85060582621
T3 - Proceedings of the 7th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, BigSpatial 2018
SP - 47
EP - 56
BT - Proceedings of the 7th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, BigSpatial 2018
PB - Association for Computing Machinery, Inc
T2 - 7th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, BigSpatial 2018
Y2 - 6 November 2018 through 6 November 2018
ER -