TY - GEN
T1 - Approxjoin
T2 - 2018 ACM Symposium on Cloud Computing, SoCC 2018
AU - Le Quoc, Do
AU - Akkus, Istemi Ekin
AU - Bhatotia, Pramod
AU - Blanas, Spyros
AU - Chen, Ruichuan
AU - Fetzer, Christof
AU - Strufe, Thorsten
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/10/11
Y1 - 2018/10/11
N2 - A distributed join is a fundamental operation for processing massive datasets in parallel. Unfortunately, computing an equi-join over such datasets is very resource-intensive, even when done in parallel. Given this cost, the equi-join operator becomes a natural candidate for optimization using approximation techniques, which allow users to trade accuracy for latency. Finding the right approximation technique for joins, however, is a challenging task. Sampling, in particular, cannot be directly used in joins; naïvely performing a join over a sample of the dataset will not preserve statistical properties of the query result.
AB - A distributed join is a fundamental operation for processing massive datasets in parallel. Unfortunately, computing an equi-join over such datasets is very resource-intensive, even when done in parallel. Given this cost, the equi-join operator becomes a natural candidate for optimization using approximation techniques, which allow users to trade accuracy for latency. Finding the right approximation technique for joins, however, is a challenging task. Sampling, in particular, cannot be directly used in joins; naïvely performing a join over a sample of the dataset will not preserve statistical properties of the query result.
KW - Approximate computing
KW - Approximate join processing
KW - Distributed systems
KW - Multi-way joins
KW - Stratified sampling
UR - http://www.scopus.com/inward/record.url?scp=85059015197&partnerID=8YFLogxK
U2 - 10.1145/3267809.3267834
DO - 10.1145/3267809.3267834
M3 - Conference contribution
AN - SCOPUS:85059015197
T3 - SoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing
SP - 426
EP - 438
BT - SoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing
PB - Association for Computing Machinery, Inc
Y2 - 11 October 2018 through 13 October 2018
ER -