Abstract
This chapter introduces a new parameter, the clusteredness of data, and shows how it can be used for estimating the output cardinality of a partial preaggregation operator. Estimating the output cardinality of partial preaggregation (PPA) accurately is a necessary prerequisite for a query optimizer to reach a decision on applying it. Several factors influence the quality of an early aggregation step: the number of groups, the number of tuples in the input, the distributions of the group sizes, the size of the buffer, the buffer replacement strategy used by the algorithm and, the clusteredness of the data. Previous analyses of PPA did not consider the clusteredness of the data, but assumed randomized data. The quality of the approximations was demonstrated by thorough experiments. The experimental results are very promising, due to the high accuracy of the cardinality estimation based on the measure of clusteredness.
Original language | English |
---|---|
Title of host publication | Proceedings 2003 VLDB Conference |
Subtitle of host publication | 29th International Conference on Very Large Databases (VLDB) |
Publisher | Elsevier |
Pages | 656-667 |
Number of pages | 12 |
ISBN (Electronic) | 9780127224428 |
DOIs | |
State | Published - 1 Jan 2003 |
Externally published | Yes |