Simple, Efficient, and Robust Hash Tables for Join Processing

Altan Birler, Tobias Schmidt, Philipp Fent, Thomas Neumann

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Hash joins play a critical role in relational data processing and their performance is crucial for the overall performance of a database system. Due to the hard to predict nature of intermediate results, an ideal hash join implementation has to be both fast for typical queries and robust against unusual data distributions. In this paper, we present our simple, yet effective unchained in-memory hash table design. Unchained tables combine the techniques of build side partitioning, adjacency array layout, pipelined probes, Bloom filters, and software write-combine buffers to achieve significant improvements in n: m joins with skew, while preserving top-notch performance in 1: n joins. Our hash table outperforms open addressing by 2× on average in relational queries and both chaining and open addressing by up to 20× in graph processing queries.

Original languageEnglish
Title of host publication20th International Workshop on Data Management on New Hardware, DaMoN 2024
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400706677
DOIs
StatePublished - 10 Jun 2024
Event20th International Workshop on Data Management on New Hardware, DaMoN 2024 - Santiago, Chile
Duration: 10 Jun 2024 → …

Publication series

Name20th International Workshop on Data Management on New Hardware, DaMoN 2024

Conference

Conference20th International Workshop on Data Management on New Hardware, DaMoN 2024
Country/TerritoryChile
CitySantiago
Period10/06/24 → …

Fingerprint

Dive into the research topics of 'Simple, Efficient, and Robust Hash Tables for Join Processing'. Together they form a unique fingerprint.

Cite this