TY - JOUR
T1 - Robust Join Processing with Diamond Hardened Joins
AU - Birler, Altan
AU - Kemper, Alfons
AU - Neumann, Thomas
N1 - Publisher Copyright:
© 2024, VLDB Endowment. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Join ordering and join processing has a huge impact on query execution and can easily affect the query response time by orders of magnitude. In particular, when joins are potentially growing n:m joins, execution can be very expensive. This can be seen by examining the sizes of intermediate results: If a join query produces many redundant tuples that are later eliminated, the query is likely expensive, which is not justified by the query result. This gives the query a diamond shape, with intermediate results larger than the inputs and the output. This occurs frequently in various workloads, particularly, in graph workloads, and also in benchmarks like JOB. We call this issue the diamond problem, and to address it, we propose the diamond hardened join framework, which splits join operators into two sub operators: Lookup & Expand. By allowing these sub operators to be freely reordered by the query optimizer, we improve the runtime of queries that exhibit the diamond problem without sacrificing performance for the rest of the queries. Past theoretical work such as worst-case optimal joins similarly try to avoid huge intermediate results. However, these approaches have significant overheads that impact all queries. We demonstrate that our approach leads to excellent performance both in queries that exhibit the diamond problem and in regular queries that can be handled by traditional binary joins. This allows for a unified approach, offering excellent performance across the board. Compared to traditional joins, queries’ performance is improved by up to 500x in the CE benchmark and remains excellent in TPC-H and JOB.
AB - Join ordering and join processing has a huge impact on query execution and can easily affect the query response time by orders of magnitude. In particular, when joins are potentially growing n:m joins, execution can be very expensive. This can be seen by examining the sizes of intermediate results: If a join query produces many redundant tuples that are later eliminated, the query is likely expensive, which is not justified by the query result. This gives the query a diamond shape, with intermediate results larger than the inputs and the output. This occurs frequently in various workloads, particularly, in graph workloads, and also in benchmarks like JOB. We call this issue the diamond problem, and to address it, we propose the diamond hardened join framework, which splits join operators into two sub operators: Lookup & Expand. By allowing these sub operators to be freely reordered by the query optimizer, we improve the runtime of queries that exhibit the diamond problem without sacrificing performance for the rest of the queries. Past theoretical work such as worst-case optimal joins similarly try to avoid huge intermediate results. However, these approaches have significant overheads that impact all queries. We demonstrate that our approach leads to excellent performance both in queries that exhibit the diamond problem and in regular queries that can be handled by traditional binary joins. This allows for a unified approach, offering excellent performance across the board. Compared to traditional joins, queries’ performance is improved by up to 500x in the CE benchmark and remains excellent in TPC-H and JOB.
UR - http://www.scopus.com/inward/record.url?scp=85205352486&partnerID=8YFLogxK
U2 - 10.14778/3681954.3681995
DO - 10.14778/3681954.3681995
M3 - Conference article
AN - SCOPUS:85205352486
SN - 2150-8097
VL - 17
SP - 3215
EP - 3228
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 11
T2 - 50th International Conference on Very Large Data Bases, VLDB 2024
Y2 - 25 August 2024 through 29 August 2024
ER -