TY - GEN
T1 - In-Database Machine Learning with SQL on GPUs
AU - Schule, Maximilian
AU - Lang, Harald
AU - Springer, Maximilian
AU - Kemper, Alfons
AU - Neumann, Thomas
AU - Gunnemann, Stephan
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/7/6
Y1 - 2021/7/6
N2 - In machine learning, continuously retraining a model guarantees accurate predictions based on the latest data as training input. But to retrieve the latest data from a database, time-consuming extraction is necessary as database systems have rarely been used for operations such as matrix algebra and gradient descent. In this work, we demonstrate that SQL with recursive tables makes it possible to express a complete machine learning pipeline out of data preprocessing, model training and its validation. To facilitate the specification of loss functions, we extend the code-generating database system Umbra by an operator for automatic differentiation for use within recursive tables: With the loss function expressed in SQL as a lambda function, Umbra generates machine code for each partial derivative. We further use automatic differentiation for a dedicated gradient descent operator, which generates LLVM code to train a user-specified model on GPUs. We fine-Tune GPU kernels at hardware level to allow a higher throughput and propose non-blocking synchronisation of multiple units. In our evaluation, automatic differentiation accelerated the runtime by the number of cached subexpressions compared to compiling each derivative separately. Our GPU kernels with independent models allowed maximal throughput even for small batch sizes, making machine learning pipelines within SQL more competitive.
AB - In machine learning, continuously retraining a model guarantees accurate predictions based on the latest data as training input. But to retrieve the latest data from a database, time-consuming extraction is necessary as database systems have rarely been used for operations such as matrix algebra and gradient descent. In this work, we demonstrate that SQL with recursive tables makes it possible to express a complete machine learning pipeline out of data preprocessing, model training and its validation. To facilitate the specification of loss functions, we extend the code-generating database system Umbra by an operator for automatic differentiation for use within recursive tables: With the loss function expressed in SQL as a lambda function, Umbra generates machine code for each partial derivative. We further use automatic differentiation for a dedicated gradient descent operator, which generates LLVM code to train a user-specified model on GPUs. We fine-Tune GPU kernels at hardware level to allow a higher throughput and propose non-blocking synchronisation of multiple units. In our evaluation, automatic differentiation accelerated the runtime by the number of cached subexpressions compared to compiling each derivative separately. Our GPU kernels with independent models allowed maximal throughput even for small batch sizes, making machine learning pipelines within SQL more competitive.
KW - Automatic Differentiation
KW - GPU
KW - In-Database Machine Learning
UR - http://www.scopus.com/inward/record.url?scp=85112779537&partnerID=8YFLogxK
U2 - 10.1145/3468791.3468840
DO - 10.1145/3468791.3468840
M3 - Conference contribution
AN - SCOPUS:85112779537
T3 - ACM International Conference Proceeding Series
SP - 25
EP - 36
BT - 33rd International Conference on Scientific and Statistical Database Management, SSDBM 2021, Proceedings
A2 - Zhu, Qiang
A2 - Zhu, Xingquan
A2 - Tu, Yicheng
A2 - Xu, Zichen
A2 - Kumar, Anand
PB - Association for Computing Machinery
T2 - 33rd International Conference on Scientific and Statistical Database Management, SSDBM 2021
Y2 - 6 July 2021
ER -