Exploiting Code Generation for Efficient LIKE Pattern Matching

Adrian Riedl, Philipp Fent, Maximilian Bandle, Thomas Neumann

Research output: Contribution to journalConference articlepeer-review

Abstract

Efficiently evaluating text pattern matching is one of the most common computationally expensive tasks in data processing pipelines. Especially when dealing with text-heavy real-world data, evaluating even simple LIKE predicates is costly. Despite the abundance of text and the frequency of string-handling expressions in real-world queries, processing is an afterthought for most systems. We argue that we must instead properly integrate text processing into the flow of DBMS query execution. In this work, we propose a code generation approach that specifically tailors the generated code to the given pattern and matching algorithm and integrates cleanly into DBMS query compilation. In addition, we introduce a generalized SSE search algorithm that uses a sequence of SSE instructions to compare packed strings in the generated code to efficiently locate longer input patterns. Our approach of generating specialized code for each pattern eliminates the overhead of interpreting the pattern for each tuple. As a result, we improve the performance of LIKE pattern matching by up to 2.5×, demonstrating that code generation can significantly improve the efficiency of LIKE predicate evaluation in DBMSs.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume3462
StatePublished - 2023
EventJoint Workshops at the 49th International Conference on Very Large Data Bases, VLDBW 2023 - Vancouver, Canada
Duration: 28 Aug 20231 Sep 2023

Fingerprint

Dive into the research topics of 'Exploiting Code Generation for Efficient LIKE Pattern Matching'. Together they form a unique fingerprint.

Cite this