PS-tree-based efficient boolean expression matching for highdimensional and dense workloads

Shuping Ji, Hans Arno Jacobsen

Research output: Contribution to journalConference articlepeer-review

16 Scopus citations

Abstract

Boolean expression matching is an important function for many applications. However, existing solutions still suffer from limitations when applied to high-dimensional and dense workloads. To overcome these limitations, in this paper, we design a data structure called PS-Tree that can efficiently index subscriptions in one dimension. By dividing predicates into disjoint predicate spaces, PS-Tree achieves high matching performance and good expressiveness. Based on PS-Tree, we first propose a Boolean expression matching algorithm PSTBloom. By efficiently filtering out a large proportion of unmatching subscriptions, PSTBloom achieves high matching performance, especially for high-dimensional workloads. PSTBloom also achieves fast index construction and a small memory footprint. Compared with state-of-theart methods, comprehensive experiments show that PSTBloom reduces matching time, index construction time and memory usage by up to 84%, 78% and 94%, respectively. Although PSTBloom is effective for many workload distributions, dense workloads represent new challenges to PSTBloom and other algorithms. To effectively handle dense workloads, we further propose the PSTHash algorithm, which divides subscriptions into disjoint multidimensional predicate spaces. This organization prunes partially matching subscriptions efficiently. Comprehensive experiments on both synthetic and real-world datasets show that PSTHash improves the matching performance by up to 92% for dense workloads.

Original languageEnglish
Pages (from-to)251-264
Number of pages14
JournalProceedings of the VLDB Endowment
Volume12
Issue number3
DOIs
StatePublished - 2018
Event45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States
Duration: 26 Aug 201730 Aug 2017

Fingerprint

Dive into the research topics of 'PS-tree-based efficient boolean expression matching for highdimensional and dense workloads'. Together they form a unique fingerprint.

Cite this