Communication-Optimal Parallel Reservoir Sampling

Christian Winter, Moritz Sichert, Altan Birler, Thomas Neumann, Alfons Kemper

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

Abstract

When evaluating complex analytical queries on high-velocity data streams, many systems cannot run those queries on all elements of a stream. Sampling is a widely used method to reduce the system load by replacing the input with a representative yet manageable subset. For unbounded data, reservoir sampling generates a fixed-size uniform sample independent of the input cardinality. However, the collection of reservoir samples itself can already be a bottleneck for high-velocity data. In this paper, we introduce a technique that allows fully parallelizing reservoir sampling for many-core architectures. Our approach relies on the efficient combination of thread-local samples taken over chunks of the input without necessitating communication during the sampling phase and with minimal communication when merging. We show how our efficient merge guarantees uniform random samples while allowing data to be distributed over worker threads arbitrarily. Our analysis of this approach within the Umbra database system demonstrates linear scaling along the available threads and the ability to sustain high-velocity workloads.

OriginalspracheEnglisch
TitelDatenbanksysteme fur Business, Technologie und Web, BTW 2023
Redakteure/-innenBirgitta Konig-Ries, Stefanie Scherzinger, Wolfgang Lehner, Gottfried Vossen
Herausgeber (Verlag)Gesellschaft fur Informatik (GI)
Seiten567-578
Seitenumfang12
ISBN (elektronisch)9783885797258
DOIs
PublikationsstatusVeröffentlicht - 2023
Veranstaltung2023 Datenbanksysteme fur Business, Technologie und Web, BTW 2023 - 2023 Database Systems for Business, Technology and Web, BTW 2023 - Dresden, Deutschland
Dauer: 6 März 202310 März 2023

Publikationsreihe

NameLecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI)
BandP-331
ISSN (Print)1617-5468

Konferenz

Konferenz2023 Datenbanksysteme fur Business, Technologie und Web, BTW 2023 - 2023 Database Systems for Business, Technology and Web, BTW 2023
Land/GebietDeutschland
OrtDresden
Zeitraum6/03/2310/03/23

Fingerprint

Untersuchen Sie die Forschungsthemen von „Communication-Optimal Parallel Reservoir Sampling“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren