Relevance matters: Capitalizing on less - Top-k matching in publish/subscribe

Mohammad Sadoghi, Hans Arno Jacobsen

Research output: Contribution to journalConference articlepeer-review

16 Scopus citations

Abstract

The efficient processing of large collections of Boolean expressions plays a central role in major data intensive applications ranging from user-centric processing and personalization to real-time data analysis. Emerging applications such as computational advertising and selective information dissemination demand determining and presenting to an end-user only the most relevant content that is both user-consumable and suitable for limited screen real estate of target devices. To retrieve the most relevant content, we present BE*-Tree, a novel indexing data structure designed for effective hierarchical top-k pattern matching, which as its by-product also reduces the operational cost of processing millions of patterns. To further reduce processing cost, BE*-Tree employs an adaptive and non-rigid space-cutting technique designed to efficiently index Boolean expressions over a high-dimensional continuous space. At the core of BE*-Tree lie two innovative ideas: (1) a bi-directional tree expansion build as a top-down (data and space clustering) and a bottom-up growths (space clustering), which together enable indexing only non-empty continuous sub-spaces, and (2) an overlap-free splitting strategy. Finally, the performance of BE*-Tree is proven through a comprehensive experimental comparison against state-of-the-art index structures for matching Boolean expressions.

Original languageEnglish
Article number6228133
Pages (from-to)786-797
Number of pages12
JournalProceedings - International Conference on Data Engineering
DOIs
StatePublished - 2012
Externally publishedYes
EventIEEE 28th International Conference on Data Engineering, ICDE 2012 - Arlington, VA, United States
Duration: 1 Apr 20125 Apr 2012

Fingerprint

Dive into the research topics of 'Relevance matters: Capitalizing on less - Top-k matching in publish/subscribe'. Together they form a unique fingerprint.

Cite this