TY - JOUR
T1 - Instant loading for main memory databases
AU - Mühlbauer, Tobias
AU - Rödiger, Wolf
AU - Seilbeck, Robert
AU - Reiser, Angelika
AU - Kemper, Alfons
AU - Neumann, Thomas
PY - 2013/9
Y1 - 2013/9
N2 - eScience and big data analytics applications are facing the challenge of efficiently evaluating complex queries over vast amounts of structured text data archived in network storage solutions. To analyze such data in traditional disk-based database systems, it needs to be bulk loaded, an operation whose performance largely depends on the wire speed of the data source and the speed of the data sink, i.e., the disk. As the speed of network adapters and disks has stagnated in the past, loading has become a major bottleneck. The delays it is causing are now ubiquitous as text formats are a preferred storage format for reasons of portability. But the game has changed: Ever increasing main memory capacities have fostered the development of in-memory database systems and very fast network infrastructures are on the verge of becoming economical. While hardware limitations for fast loading have disappeared, current approaches for main memory databases fail to saturate the now available wire speeds of tens of Gbit/s. With Instant Loading, we contribute a novel CSV loading approach that allows scalable bulk loading at wire speed. This is achieved by optimizing all phases of loading for modern super-scalar multi-core CPUs. Large main memory capacities and Instant Loading thereby facilitate a very efficient data staging processing model consisting of instantaneous load-work-unload cycles across data archives on a single node. Once data is loaded, updates and queries are efficiently processed with the flexibility, security, and high performance of relational main memory databases.
AB - eScience and big data analytics applications are facing the challenge of efficiently evaluating complex queries over vast amounts of structured text data archived in network storage solutions. To analyze such data in traditional disk-based database systems, it needs to be bulk loaded, an operation whose performance largely depends on the wire speed of the data source and the speed of the data sink, i.e., the disk. As the speed of network adapters and disks has stagnated in the past, loading has become a major bottleneck. The delays it is causing are now ubiquitous as text formats are a preferred storage format for reasons of portability. But the game has changed: Ever increasing main memory capacities have fostered the development of in-memory database systems and very fast network infrastructures are on the verge of becoming economical. While hardware limitations for fast loading have disappeared, current approaches for main memory databases fail to saturate the now available wire speeds of tens of Gbit/s. With Instant Loading, we contribute a novel CSV loading approach that allows scalable bulk loading at wire speed. This is achieved by optimizing all phases of loading for modern super-scalar multi-core CPUs. Large main memory capacities and Instant Loading thereby facilitate a very efficient data staging processing model consisting of instantaneous load-work-unload cycles across data archives on a single node. Once data is loaded, updates and queries are efficiently processed with the flexibility, security, and high performance of relational main memory databases.
UR - http://www.scopus.com/inward/record.url?scp=84891099231&partnerID=8YFLogxK
U2 - 10.14778/2556549.2556555
DO - 10.14778/2556549.2556555
M3 - Conference article
AN - SCOPUS:84891099231
SN - 2150-8097
VL - 6
SP - 1702
EP - 1713
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 14
T2 - 39th International Conference on Very Large Data Bases, VLDB 2012
Y2 - 26 August 2013 through 30 August 2013
ER -