User-Defined Operators: Efficiently Integrating Custom Algorithms into Modern Databases

Moritz Sichert, Thomas Neumann

Research output: Contribution to journalConference articlepeer-review

11 Scopus citations

Abstract

In recent years, complex data mining and machine learning algorithms have become more common in data analytics. Several specialized systems exist to evaluate these algorithms on ever-growing data sets, which are built to efficiently execute different types of complex analytics queries. However, using these various systems comes at a price. Moving data out of traditional database systems is often slow as it requires exporting and importing data, which is typically performed using the relatively inefficient CSV format. Additionally, database systems usually offer strong ACID guarantees, which are lost when adding new, external systems. This disadvantage can be detrimental to the consistency of the results. Most data scientists still prefer not to use classical database systems for data analytics. The main reason why RDBMS are not used is that SQL is difficult to work with due to its declarative and set-oriented nature, and is not easily extensible. We present User-Defined Operators (UDOs) as a concept to include custom algorithms into modern query engines. Users can write idiomatic code in the programming language of their choice, which is then directly integrated into existing database systems. We show that our implementation can compete with specialized tools and existing query engines while retaining all beneficial properties of the database system.

Original languageEnglish
Pages (from-to)1119-1131
Number of pages13
JournalContemporary Mathematics
Volume15
Issue number5
DOIs
StatePublished - 2022
Event48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australia
Duration: 5 Sep 20229 Sep 2022

Fingerprint

Dive into the research topics of 'User-Defined Operators: Efficiently Integrating Custom Algorithms into Modern Databases'. Together they form a unique fingerprint.

Cite this