TY - GEN
T1 - SQL- and operator-centric data analytics in relational main-Memory databases
AU - Passing, Linnea
AU - Then, Manuel
AU - Hubig, Nina
AU - Lang, Harald
AU - Schreier, Michael
AU - Günnemann, Stephan
AU - Kemper, Alfons
AU - Neumann, Thomas
N1 - Publisher Copyright:
© 2017, Copyright is with the authors.
PY - 2017
Y1 - 2017
N2 - Data volume and complexity continue to increase, as does the need for insight into data. Today, data management and data analytics are most often conducted in separate systems: database systems and dedicated analytics systems. This separation leads to time- and resource-consuming data transfer, stale data, and complex IT architectures. In this paper we show that relational main-memory database systems are capable of executing analytical algorithms in a fully transactional environment while still exceeding performance of state-of-the-art analytical systems rendering the division of data management and data analytics unnecessary. We classify and assess multiple ways of integrating data analytics in database systems. Based on this assessment, we extend SQL with a non-appending iteration construct that provides an important building block for analytical algorithms while retaining the high expressiveness of the original language. Furthermore, we propose the integration of analytics operators directly into the database core, where algorithms can be highly tuned for modern hardware. These operators can be parameterized with our novel user-defined lambda expressions. As we integrate lambda expressions into SQL instead of proposing a new proprietary query language, we ensure usability for diverse groups of users. Additionally, we carry out an extensive experimental evaluation of our approaches in HyPer, our full-fledged SQL main-memory database system, and show their superior performance in comparison to dedicated solutions.
AB - Data volume and complexity continue to increase, as does the need for insight into data. Today, data management and data analytics are most often conducted in separate systems: database systems and dedicated analytics systems. This separation leads to time- and resource-consuming data transfer, stale data, and complex IT architectures. In this paper we show that relational main-memory database systems are capable of executing analytical algorithms in a fully transactional environment while still exceeding performance of state-of-the-art analytical systems rendering the division of data management and data analytics unnecessary. We classify and assess multiple ways of integrating data analytics in database systems. Based on this assessment, we extend SQL with a non-appending iteration construct that provides an important building block for analytical algorithms while retaining the high expressiveness of the original language. Furthermore, we propose the integration of analytics operators directly into the database core, where algorithms can be highly tuned for modern hardware. These operators can be parameterized with our novel user-defined lambda expressions. As we integrate lambda expressions into SQL instead of proposing a new proprietary query language, we ensure usability for diverse groups of users. Additionally, we carry out an extensive experimental evaluation of our approaches in HyPer, our full-fledged SQL main-memory database system, and show their superior performance in comparison to dedicated solutions.
KW - Computational databases
KW - HyPer
UR - http://www.scopus.com/inward/record.url?scp=85032353288&partnerID=8YFLogxK
U2 - 10.5441/002/edbt.2017.09
DO - 10.5441/002/edbt.2017.09
M3 - Conference contribution
AN - SCOPUS:85032353288
T3 - Advances in Database Technology - EDBT
SP - 84
EP - 95
BT - Advances in Database Technology - EDBT 2017
A2 - Mitschang, Bernhard
A2 - Markl, Volker
A2 - Bress, Sebastian
A2 - Andritsos, Periklis
A2 - Sattler, Kai-Uwe
A2 - Orlando, Salvatore
PB - OpenProceedings.org
T2 - 20th International Conference on Extending Database Technology, EDBT 2017
Y2 - 21 March 2017 through 24 March 2017
ER -