TY - JOUR
T1 - CorBit
T2 - Joint Workshops at the 49th International Conference on Very Large Data Bases, VLDBW 2023
AU - Lyu, Xi
AU - Kipf, Andreas
AU - Pfeil, Pascal
AU - Horn, Dominik
AU - Giceva, Jana
AU - Kraska, Tim
N1 - Publisher Copyright:
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2023
Y1 - 2023
N2 - A bitmap index is a secondary index structure that supports equality and range predicates. In its simplest form, a bitmap index stores one bitmap per unique column value indicating qualifying tuples. To use such indexes in large-scale data warehousing, they need to be space efficient. Existing schemes such as Roaring can compress individual bitmaps but do not consider cross-column compression. In this paper, we introduce CorBit, a technique that leverages column correlations to compress bitmap indexes on a given table. The high-level idea is to only store the bits that need to be flipped (the diff) when encoding the bitmaps of a column that correlates with another column that already has a bitmap index in place. CorBit automatically determines which columns to store an index for and which column bitmaps to diff-encode, minimizing the overall size of the index. Compared to Roaring, CorBit consumes 9.1% less space on the DMV dataset while incurring a 12.6% runtime overhead.
AB - A bitmap index is a secondary index structure that supports equality and range predicates. In its simplest form, a bitmap index stores one bitmap per unique column value indicating qualifying tuples. To use such indexes in large-scale data warehousing, they need to be space efficient. Existing schemes such as Roaring can compress individual bitmaps but do not consider cross-column compression. In this paper, we introduce CorBit, a technique that leverages column correlations to compress bitmap indexes on a given table. The high-level idea is to only store the bits that need to be flipped (the diff) when encoding the bitmaps of a column that correlates with another column that already has a bitmap index in place. CorBit automatically determines which columns to store an index for and which column bitmaps to diff-encode, minimizing the overall size of the index. Compared to Roaring, CorBit consumes 9.1% less space on the DMV dataset while incurring a 12.6% runtime overhead.
UR - http://www.scopus.com/inward/record.url?scp=85171254331&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85171254331
SN - 1613-0073
VL - 3462
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
Y2 - 28 August 2023 through 1 September 2023
ER -