Performance optimal filtering: Bloom overtakes cuckoo at high throughput

Harald Lang, Thomas Neumann, Alfons Kemper, Peter Boncz

Research output: Contribution to journalConference articlepeer-review

35 Scopus citations

Abstract

We define the concept of performance-optimal filtering to indicate the Bloom or Cuckoo filter configuration that best accelerates a particular task. While the space-precision tradeoffof these filters has been well studied, we show how to pick a filter that maximizes the performance for a given workload. This choice might be "suboptimal" relative to traditional space-precision metrics, but it will lead to better performance in practice. In this paper, we focus on highthroughput filter use cases, aimed at avoiding CPU work, e.g., a cache miss, a network message, or a local disk I/O - events that can happen at rates of millions to hundreds per second. Besides the false-positive rate and memory footprint of the filter, performance optimality has to take into account the absolute cost of the filter lookup as well as the saved work per lookup that filtering avoids; while the actual rate of negative lookups in the workload determines whether using a filter improves overall performance at all. In the course of the paper, we introduce new filter variants, namely the register-blocked and cache-sectorized Bloom filters. We present new implementation techniques and perform an extensive evaluation on modern hardware platforms, including the wide-SIMD Skylake-X and Knights Landing. This experimentation shows that in high-throughput situations, the lower lookup cost of blocked Bloom filters allows them to overtake Cuckoo filters.

Original languageEnglish
Pages (from-to)502-515
Number of pages14
JournalProceedings of the VLDB Endowment
Volume12
Issue number5
DOIs
StatePublished - 2018
Event45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States
Duration: 26 Aug 201730 Aug 2017

Fingerprint

Dive into the research topics of 'Performance optimal filtering: Bloom overtakes cuckoo at high throughput'. Together they form a unique fingerprint.

Cite this