TY - GEN
T1 - SASPAR
T2 - 39th IEEE International Conference on Data Engineering, ICDE 2023
AU - Karimov, Jeyhun
AU - Jacobsen, Hans Arno
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Data partitioning induces network transfers and dominates the cost of stream data analytics. Moreover, partitioning streaming data for multiple stream queries in the same cluster can easily saturate the network bandwidth and lead to high end-to-end latencies.The goal of this paper is to share the partition operation in streaming workloads and maximize the sharing opportunities for multiple stream queries. However, there are several challenges, such as minimizing data copy, optimizing the partitioning strategy for multiple queries, and minimizing latency.We propose SASPAR, Shared Adaptive Stream Partitioner, which is able to share data partitioning among multiple stream queries. Our contributions are threefold. First, we propose a new technique to optimize the partitioning strategy for multiple stream queries. Second, we present an adaptive query execution framework that performs optimizations at run-time, without stopping the query execution plan. Third, we utilize meta-heuristics and machine learning when solving the underlying optimization problem takes more time than expected.SASPAR is designed as a versatile layer to sit on top of a stream processing engine (SPE). We operate SASPAR on top of three state-of-the-art SPEs with hundreds of stream queries. Our experimental results show that SASPAR improves the performance (throughput and latency) of all underlying SPEs by up to 3x.
AB - Data partitioning induces network transfers and dominates the cost of stream data analytics. Moreover, partitioning streaming data for multiple stream queries in the same cluster can easily saturate the network bandwidth and lead to high end-to-end latencies.The goal of this paper is to share the partition operation in streaming workloads and maximize the sharing opportunities for multiple stream queries. However, there are several challenges, such as minimizing data copy, optimizing the partitioning strategy for multiple queries, and minimizing latency.We propose SASPAR, Shared Adaptive Stream Partitioner, which is able to share data partitioning among multiple stream queries. Our contributions are threefold. First, we propose a new technique to optimize the partitioning strategy for multiple stream queries. Second, we present an adaptive query execution framework that performs optimizations at run-time, without stopping the query execution plan. Third, we utilize meta-heuristics and machine learning when solving the underlying optimization problem takes more time than expected.SASPAR is designed as a versatile layer to sit on top of a stream processing engine (SPE). We operate SASPAR on top of three state-of-the-art SPEs with hundreds of stream queries. Our experimental results show that SASPAR improves the performance (throughput and latency) of all underlying SPEs by up to 3x.
KW - Stream processing
KW - shared data partitioning
UR - http://www.scopus.com/inward/record.url?scp=85167669105&partnerID=8YFLogxK
U2 - 10.1109/ICDE55515.2023.00076
DO - 10.1109/ICDE55515.2023.00076
M3 - Conference contribution
AN - SCOPUS:85167669105
T3 - Proceedings - International Conference on Data Engineering
SP - 922
EP - 935
BT - Proceedings - 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023
PB - IEEE Computer Society
Y2 - 3 April 2023 through 7 April 2023
ER -