TY - JOUR
T1 - Scalable Multiway Stream Joins in Hardware
AU - Najafi, Mohammadreza
AU - Sadoghi, Mohammad
AU - Jacobsen, Hans Arno
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2020/12/1
Y1 - 2020/12/1
N2 - Efficient real-time analytics are an integral part of an increasing number of data management applications, such as computational targeted advertising, algorithmic trading, and Internet of Things. In this paper, we focus primarily on accelerating stream joins, which are arguably one of the most commonly used and resource-intensive operators in stream processing. We propose a scalable circular pipeline design (sf{ Circular\text{-}MJ}Circular-MJ) in hardware to orchestrate a multiway join while minimizing data flow disruption. In this circular design, each new tuple (given its origin stream) starts its processing from a specific join core and passes through all respective join cores in a pipeline sequence to produce the final results. We also present a novel two-stage pipeline stream join (\sf{ Stashed\text{-}MJ}Stashed-MJ) that uses a best-effort buffering technique (referred to as stash) to maintain intermediate results. If an overwrite is detected in the stash, our design automatically resorts to recomputing intermediate results. Finally, we present a parallelized version of our multiway stream join by integrating our proposed pipelines into a parallel unidirectional flow-based architecture (sf{ Parallel\text{-}MJ}Parallel-MJ). Our experimental results demonstrate a linear throughput scaling with respect to the numbers of streams and processing cores.
AB - Efficient real-time analytics are an integral part of an increasing number of data management applications, such as computational targeted advertising, algorithmic trading, and Internet of Things. In this paper, we focus primarily on accelerating stream joins, which are arguably one of the most commonly used and resource-intensive operators in stream processing. We propose a scalable circular pipeline design (sf{ Circular\text{-}MJ}Circular-MJ) in hardware to orchestrate a multiway join while minimizing data flow disruption. In this circular design, each new tuple (given its origin stream) starts its processing from a specific join core and passes through all respective join cores in a pipeline sequence to produce the final results. We also present a novel two-stage pipeline stream join (\sf{ Stashed\text{-}MJ}Stashed-MJ) that uses a best-effort buffering technique (referred to as stash) to maintain intermediate results. If an overwrite is detected in the stash, our design automatically resorts to recomputing intermediate results. Finally, we present a parallelized version of our multiway stream join by integrating our proposed pipelines into a parallel unidirectional flow-based architecture (sf{ Parallel\text{-}MJ}Parallel-MJ). Our experimental results demonstrate a linear throughput scaling with respect to the numbers of streams and processing cores.
KW - Dataflow architectures
KW - hardware architecture
KW - multiple data stream architectures
KW - pattern matching
UR - http://www.scopus.com/inward/record.url?scp=85096133682&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2019.2916860
DO - 10.1109/TKDE.2019.2916860
M3 - Article
AN - SCOPUS:85096133682
SN - 1041-4347
VL - 32
SP - 2438
EP - 2452
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 12
M1 - 8713929
ER -