Locality-sensitive operators for parallel main-memory database clusters

Wolf Rödiger, Tobias Mühlbauer, Philipp Unterbrunner, Angelika Reiser, Alfons Kemper, Thomas Neumann

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

42 Zitate (Scopus)

Abstract

The growth in compute speed has outpaced the growth in network bandwidth over the last decades. This has led to an increasing performance gap between local and distributed processing. A parallel database cluster thus has to maximize the locality of query processing. A common technique to this end is to co-partition relations to avoid expensive data shuffling across the network. However, this is limited to one attribute per relation and is expensive to maintain in the face of updates. Other attributes often exhibit a fuzzy co-location due to correlations with the distribution key but current approaches do not leverage this. In this paper, we introduce locality-sensitive data shuffling, which can dramatically reduce the amount of network communication for distributed operators such as join and aggregation. We present four novel techniques: (i) optimal partition assignment exploits locality to reduce the network phase duration; (ii) communication scheduling avoids bandwidth underutilization due to cross traffic; (iii) adaptive radix partitioning retains locality during data repartitioning and handles value skew gracefully; and (iv) selective broadcast reduces network communication in the presence of extreme value skew or large numbers of duplicates. We present comprehensive experimental results, which show that our techniques can improve performance by up to factor of 5 for fuzzy co-location and a factor of 3 for inputs with value skew.

OriginalspracheEnglisch
Titel2014 IEEE 30th International Conference on Data Engineering, ICDE 2014
Herausgeber (Verlag)IEEE Computer Society
Seiten592-603
Seitenumfang12
ISBN (Print)9781479925544
DOIs
PublikationsstatusVeröffentlicht - 2014
Veranstaltung30th IEEE International Conference on Data Engineering, ICDE 2014 - Chicago, IL, USA/Vereinigte Staaten
Dauer: 31 März 20144 Apr. 2014

Publikationsreihe

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Konferenz

Konferenz30th IEEE International Conference on Data Engineering, ICDE 2014
Land/GebietUSA/Vereinigte Staaten
OrtChicago, IL
Zeitraum31/03/144/04/14

Fingerprint

Untersuchen Sie die Forschungsthemen von „Locality-sensitive operators for parallel main-memory database clusters“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren