Approxjoin: Approximate distributed joins

Do Le Quoc, Istemi Ekin Akkus, Pramod Bhatotia, Spyros Blanas, Ruichuan Chen, Christof Fetzer, Thorsten Strufe

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

A distributed join is a fundamental operation for processing massive datasets in parallel. Unfortunately, computing an equi-join over such datasets is very resource-intensive, even when done in parallel. Given this cost, the equi-join operator becomes a natural candidate for optimization using approximation techniques, which allow users to trade accuracy for latency. Finding the right approximation technique for joins, however, is a challenging task. Sampling, in particular, cannot be directly used in joins; naïvely performing a join over a sample of the dataset will not preserve statistical properties of the query result.

Original languageEnglish
Title of host publicationSoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing
PublisherAssociation for Computing Machinery, Inc
Pages426-438
Number of pages13
ISBN (Electronic)9781450360111
DOIs
StatePublished - 11 Oct 2018
Externally publishedYes
Event2018 ACM Symposium on Cloud Computing, SoCC 2018 - Carlsbad, United States
Duration: 11 Oct 201813 Oct 2018

Publication series

NameSoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing

Conference

Conference2018 ACM Symposium on Cloud Computing, SoCC 2018
Country/TerritoryUnited States
CityCarlsbad
Period11/10/1813/10/18

Keywords

  • Approximate computing
  • Approximate join processing
  • Distributed systems
  • Multi-way joins
  • Stratified sampling

Fingerprint

Dive into the research topics of 'Approxjoin: Approximate distributed joins'. Together they form a unique fingerprint.

Cite this