Estimating the Output Cardinality of Partial Preaggregation with a Measure of Clusteredness

Sven Helmer, Thomas Neumann, Guido Moerkotte

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

This chapter introduces a new parameter, the clusteredness of data, and shows how it can be used for estimating the output cardinality of a partial preaggregation operator. Estimating the output cardinality of partial preaggregation (PPA) accurately is a necessary prerequisite for a query optimizer to reach a decision on applying it. Several factors influence the quality of an early aggregation step: the number of groups, the number of tuples in the input, the distributions of the group sizes, the size of the buffer, the buffer replacement strategy used by the algorithm and, the clusteredness of the data. Previous analyses of PPA did not consider the clusteredness of the data, but assumed randomized data. The quality of the approximations was demonstrated by thorough experiments. The experimental results are very promising, due to the high accuracy of the cardinality estimation based on the measure of clusteredness.

Original languageEnglish
Title of host publicationProceedings 2003 VLDB Conference
Subtitle of host publication29th International Conference on Very Large Databases (VLDB)
PublisherElsevier
Pages656-667
Number of pages12
ISBN (Electronic)9780127224428
DOIs
StatePublished - 1 Jan 2003
Externally publishedYes

Fingerprint

Dive into the research topics of 'Estimating the Output Cardinality of Partial Preaggregation with a Measure of Clusteredness'. Together they form a unique fingerprint.

Cite this