CorBit: Leveraging Correlations for Compressing Bitmap Indexes

Xi Lyu, Andreas Kipf, Pascal Pfeil, Dominik Horn, Jana Giceva, Tim Kraska

Publikation: Beitrag in FachzeitschriftKonferenzartikelBegutachtung

Abstract

A bitmap index is a secondary index structure that supports equality and range predicates. In its simplest form, a bitmap index stores one bitmap per unique column value indicating qualifying tuples. To use such indexes in large-scale data warehousing, they need to be space efficient. Existing schemes such as Roaring can compress individual bitmaps but do not consider cross-column compression. In this paper, we introduce CorBit, a technique that leverages column correlations to compress bitmap indexes on a given table. The high-level idea is to only store the bits that need to be flipped (the diff) when encoding the bitmaps of a column that correlates with another column that already has a bitmap index in place. CorBit automatically determines which columns to store an index for and which column bitmaps to diff-encode, minimizing the overall size of the index. Compared to Roaring, CorBit consumes 9.1% less space on the DMV dataset while incurring a 12.6% runtime overhead.

OriginalspracheEnglisch
FachzeitschriftCEUR Workshop Proceedings
Jahrgang3462
PublikationsstatusVeröffentlicht - 2023
VeranstaltungJoint Workshops at the 49th International Conference on Very Large Data Bases, VLDBW 2023 - Vancouver, Kanada
Dauer: 28 Aug. 20231 Sept. 2023

Fingerprint

Untersuchen Sie die Forschungsthemen von „CorBit: Leveraging Correlations for Compressing Bitmap Indexes“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren