Flow-Join: Adaptive skew handling for distributed joins over high-speed networks

Wolf Rodiger, Sam Idicula, Alfons Kemper, Thomas Neumann

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

42 Zitate (Scopus)

Abstract

Modern InfiniBand interconnects offer link speeds of several gigabytes per second and a remote direct memory access (RDMA) paradigm for zero-copy network communication. Both are crucial for parallel database systems to achieve scalable distributed query processing where adding a server to the cluster increases performance. However, the scalability of distributed joins is threatened by unexpected data characteristics: Skew can cause a severe load imbalance such that a single server has to process a much larger part of the input than its fair share and by this slows down the entire distributed query. We introduce Flow-Join, a novel distributed join algorithm that handles attribute value skew with minimal overhead. Flow-Join detects heavy hitters at runtime using small approximate histograms and adapts the redistribution scheme to resolve load imbalances before they impact the join performance. Previous approaches often involve expensive analysis phases, which slow down distributed join processing for non-skewed workloads. This is especially the case for modern high-speed interconnects, which are too fast to hide the extra computation. Other skew handling approaches require detailed statistics, which are often not available or overly inaccurate for intermediate results. In contrast, Flow-Join uses our novel lightweight skew handling scheme to execute at the full network speed of more than 6 GB/s for InfiniBand 4×FDR, joining a skewed input at 11.5 billion tuples/s with 32 servers. This is 6.8× faster than a standard distributed hash join using the same hardware. At the same time, Flow-Join does not compromise the join performance for non-skewed workloads.

OriginalspracheEnglisch
Titel2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
Seiten1194-1205
Seitenumfang12
ISBN (elektronisch)9781509020195
DOIs
PublikationsstatusVeröffentlicht - 22 Juni 2016
Veranstaltung32nd IEEE International Conference on Data Engineering, ICDE 2016 - Helsinki, Finnland
Dauer: 16 Mai 201620 Mai 2016

Publikationsreihe

Name2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016

Konferenz

Konferenz32nd IEEE International Conference on Data Engineering, ICDE 2016
Land/GebietFinnland
OrtHelsinki
Zeitraum16/05/1620/05/16

Fingerprint

Untersuchen Sie die Forschungsthemen von „Flow-Join: Adaptive skew handling for distributed joins over high-speed networks“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren