TY - GEN
T1 - The gaussian bloom filter
AU - Werner, Martin
AU - Schönfeld, Mirco
N1 - Publisher Copyright:
© 2015, Springer International Publishing Switzerland, All rights Reserved.
PY - 2015
Y1 - 2015
N2 - Modern databases tailored to highly distributed, fault tolerant management of information for big data applications exploit a classical data structure for reducing disk and network I/O as well as for managing data distribution: The Bloom filter. This data structure allows to encode small sets of elements, typically the keys in a key-value store, into a small, constant-size data structure. In order to reduce memory consumption, this data structure suffers from false positives which lead to additional I/O operations and are therefore only harmful with respect to performance. With this paper, we propose an extension to the classical Bloom filter construction which facilitates the use of floating point coprocessors and GPUs or additional main memory in order to reduce false positives. The proposed data structure is compatible with the classical construction in the sense that the classical Bloom filter can be extracted in time linear to the size of the data structure and that the Bloom filter is a special case of our construction. We show that the approach provides a relevant gain with respect to the false positive rate. Implementations for Apache Cassandra, C++, and NVIDIA CUDA are given and support the feasibility and results of the approach.
AB - Modern databases tailored to highly distributed, fault tolerant management of information for big data applications exploit a classical data structure for reducing disk and network I/O as well as for managing data distribution: The Bloom filter. This data structure allows to encode small sets of elements, typically the keys in a key-value store, into a small, constant-size data structure. In order to reduce memory consumption, this data structure suffers from false positives which lead to additional I/O operations and are therefore only harmful with respect to performance. With this paper, we propose an extension to the classical Bloom filter construction which facilitates the use of floating point coprocessors and GPUs or additional main memory in order to reduce false positives. The proposed data structure is compatible with the classical construction in the sense that the classical Bloom filter can be extracted in time linear to the size of the data structure and that the Bloom filter is a special case of our construction. We show that the approach provides a relevant gain with respect to the false positive rate. Implementations for Apache Cassandra, C++, and NVIDIA CUDA are given and support the feasibility and results of the approach.
KW - Bloom filter
KW - Data structures
KW - Database design
UR - http://www.scopus.com/inward/record.url?scp=84942636652&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-18120-2_12
DO - 10.1007/978-3-319-18120-2_12
M3 - Conference contribution
AN - SCOPUS:84942636652
SN - 9783319181196
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 191
EP - 206
BT - Database Systems for Advanced Applications - 20th International Conference, DASFAA 2015, Proceedings Hanoi, Vietnam, April 20-23, 2015 Proceedings, Part I
A2 - Shahabi, Cyrus
A2 - Cheema, Muhammad Aamir
A2 - Renz, Matthias
A2 - Zhou, Xiaofang
PB - Springer Verlag
T2 - 20th International Conference on Database Systems for Advanced Applications, DASFAA 2015
Y2 - 20 April 2015 through 23 April 2015
ER -