A practical approach to groupjoin and nested aggregates

Philipp Fent, Thomas Neumann

Research output: Contribution to journalConference articlepeer-review

6 Scopus citations

Abstract

Groupjoins, the combined execution of a join and a subsequent group by, are common in analytical queries, and occur in about 1/8 of the queries in TPC-H and TPC-DS. While they were originally invented to improve performance, efficient parallel execution of groupjoins can be limited by contention, which limits their usefulness in a many-core system. Having an efficient implementation of groupjoins is highly desirable, as groupjoins are not only used to fuse group by and join but are also introduced by the unnesting component of the query optimizer to avoid nested-loops evaluation of aggregates. Furthermore, the query optimizer needs be able to reason over the result of aggregation in order to schedule it correctly. Traditional selectivity and cardinality estimations quickly reach their limits when faced with computed columns from nested aggregates, which leads to poor cost estimations and thus, suboptimal query plans. In this paper, we present techniques to efficiently estimate, plan, and execute groupjoins and nested aggregates. We propose two novel techniques, aggregate estimates to predict the result distribution of aggregates, and parallel groupjoin execution for a scalable execution of groupjoins. The resulting system has significantly better estimates and a contention-free evaluation of groupjoins, which can speed up some TPC-H queries up to a factor of 2.

Original languageEnglish
Pages (from-to)2383-2396
Number of pages14
JournalProceedings of the VLDB Endowment
Volume14
Issue number11
DOIs
StatePublished - 2021
Event47th International Conference on Very Large Data Bases, VLDB 2021 - Virtual, Online
Duration: 16 Aug 202120 Aug 2021

Fingerprint

Dive into the research topics of 'A practical approach to groupjoin and nested aggregates'. Together they form a unique fingerprint.

Cite this