Algebraic query optimization for distributed top-k queries

Thomas Neumann, Sebastian Michel

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Distributed top-k query processing is increasingly becoming an essential functionality in a large number of emerging application classes. This paper addresses the efficient algebraic optimization of top-k queries in wide-area distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption, and local peer work. We use a dynamic programming approach to find the optimal execution plan using compact data synopses for selectivity estimation that is the basis for our cost model. The optimized query is executed in a hierarchical way involving a small and fixed number of communication phases. We have performed experiments on real web data that show the benefits of distributed top-k query optimization both in network resource consumption and query response time.

Original languageEnglish
Pages (from-to)197-211
Number of pages15
JournalInformatik - Forschung und Entwicklung
Volume21
Issue number3-4
DOIs
StatePublished - Jun 2007
Externally publishedYes

Keywords

  • Distributed systems
  • Query optimization
  • Top-k queries

Fingerprint

Dive into the research topics of 'Algebraic query optimization for distributed top-k queries'. Together they form a unique fingerprint.

Cite this