TY - JOUR
T1 - PS-tree-based efficient boolean expression matching for highdimensional and dense workloads
AU - Ji, Shuping
AU - Jacobsen, Hans Arno
N1 - Publisher Copyright:
© 2018 VLDB Endowment 21508097/18/07.
PY - 2018
Y1 - 2018
N2 - Boolean expression matching is an important function for many applications. However, existing solutions still suffer from limitations when applied to high-dimensional and dense workloads. To overcome these limitations, in this paper, we design a data structure called PS-Tree that can efficiently index subscriptions in one dimension. By dividing predicates into disjoint predicate spaces, PS-Tree achieves high matching performance and good expressiveness. Based on PS-Tree, we first propose a Boolean expression matching algorithm PSTBloom. By efficiently filtering out a large proportion of unmatching subscriptions, PSTBloom achieves high matching performance, especially for high-dimensional workloads. PSTBloom also achieves fast index construction and a small memory footprint. Compared with state-of-theart methods, comprehensive experiments show that PSTBloom reduces matching time, index construction time and memory usage by up to 84%, 78% and 94%, respectively. Although PSTBloom is effective for many workload distributions, dense workloads represent new challenges to PSTBloom and other algorithms. To effectively handle dense workloads, we further propose the PSTHash algorithm, which divides subscriptions into disjoint multidimensional predicate spaces. This organization prunes partially matching subscriptions efficiently. Comprehensive experiments on both synthetic and real-world datasets show that PSTHash improves the matching performance by up to 92% for dense workloads.
AB - Boolean expression matching is an important function for many applications. However, existing solutions still suffer from limitations when applied to high-dimensional and dense workloads. To overcome these limitations, in this paper, we design a data structure called PS-Tree that can efficiently index subscriptions in one dimension. By dividing predicates into disjoint predicate spaces, PS-Tree achieves high matching performance and good expressiveness. Based on PS-Tree, we first propose a Boolean expression matching algorithm PSTBloom. By efficiently filtering out a large proportion of unmatching subscriptions, PSTBloom achieves high matching performance, especially for high-dimensional workloads. PSTBloom also achieves fast index construction and a small memory footprint. Compared with state-of-theart methods, comprehensive experiments show that PSTBloom reduces matching time, index construction time and memory usage by up to 84%, 78% and 94%, respectively. Although PSTBloom is effective for many workload distributions, dense workloads represent new challenges to PSTBloom and other algorithms. To effectively handle dense workloads, we further propose the PSTHash algorithm, which divides subscriptions into disjoint multidimensional predicate spaces. This organization prunes partially matching subscriptions efficiently. Comprehensive experiments on both synthetic and real-world datasets show that PSTHash improves the matching performance by up to 92% for dense workloads.
UR - http://www.scopus.com/inward/record.url?scp=85061728540&partnerID=8YFLogxK
U2 - 10.14778/3291264.3291270
DO - 10.14778/3291264.3291270
M3 - Conference article
AN - SCOPUS:85061728540
SN - 2150-8097
VL - 12
SP - 251
EP - 264
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 3
T2 - 45th International Conference on Very Large Data Bases, VLDB 2019
Y2 - 26 August 2017 through 30 August 2017
ER -