TY - JOUR
T1 - Exploiting Code Generation for Efficient LIKE Pattern Matching
AU - Riedl, Adrian
AU - Fent, Philipp
AU - Bandle, Maximilian
AU - Neumann, Thomas
N1 - Publisher Copyright:
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2023
Y1 - 2023
N2 - Efficiently evaluating text pattern matching is one of the most common computationally expensive tasks in data processing pipelines. Especially when dealing with text-heavy real-world data, evaluating even simple LIKE predicates is costly. Despite the abundance of text and the frequency of string-handling expressions in real-world queries, processing is an afterthought for most systems. We argue that we must instead properly integrate text processing into the flow of DBMS query execution. In this work, we propose a code generation approach that specifically tailors the generated code to the given pattern and matching algorithm and integrates cleanly into DBMS query compilation. In addition, we introduce a generalized SSE search algorithm that uses a sequence of SSE instructions to compare packed strings in the generated code to efficiently locate longer input patterns. Our approach of generating specialized code for each pattern eliminates the overhead of interpreting the pattern for each tuple. As a result, we improve the performance of LIKE pattern matching by up to 2.5×, demonstrating that code generation can significantly improve the efficiency of LIKE predicate evaluation in DBMSs.
AB - Efficiently evaluating text pattern matching is one of the most common computationally expensive tasks in data processing pipelines. Especially when dealing with text-heavy real-world data, evaluating even simple LIKE predicates is costly. Despite the abundance of text and the frequency of string-handling expressions in real-world queries, processing is an afterthought for most systems. We argue that we must instead properly integrate text processing into the flow of DBMS query execution. In this work, we propose a code generation approach that specifically tailors the generated code to the given pattern and matching algorithm and integrates cleanly into DBMS query compilation. In addition, we introduce a generalized SSE search algorithm that uses a sequence of SSE instructions to compare packed strings in the generated code to efficiently locate longer input patterns. Our approach of generating specialized code for each pattern eliminates the overhead of interpreting the pattern for each tuple. As a result, we improve the performance of LIKE pattern matching by up to 2.5×, demonstrating that code generation can significantly improve the efficiency of LIKE predicate evaluation in DBMSs.
UR - http://www.scopus.com/inward/record.url?scp=85171255997&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85171255997
SN - 1613-0073
VL - 3462
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - Joint Workshops at the 49th International Conference on Very Large Data Bases, VLDBW 2023
Y2 - 28 August 2023 through 1 September 2023
ER -