TY - JOUR
T1 - LogRule
T2 - Efficient Structured Log Mining for Root Cause Analysis
AU - Notaro, Paolo
AU - Haeri, Soroush
AU - Cardoso, Jorge
AU - Gerndt, Michael
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2023/12/1
Y1 - 2023/12/1
N2 - Accurate, timely Root Cause Analysis (RCA) is essential to successful IT operations as a primary step to incident remediation. RCA automation using data mining techniques in large heterogeneous systems is, however, a challenging task, because it requires correlating multimodal information across various data sources. An increasing number of services are migrating to structured logging to enable automated monitoring and debugging of complex large-scale systems. In this paper, we leverage structured logs and association rule mining (ARM) to automate RCA. We propose the LogRule algorithm, which automatically analyzes structured logs to generate a list of explanations for an event of interest. It achieves 0.921 F1-score for the diagnosis task, while computing results 37x faster compared to the state-of-the-art solution based on FP-growth, making it a time-efficient, accurate, and interpretable ARM-based RCA algorithm. Evaluation results show that LogRule enables RCA in complex multidimensional datasets, where the execution time of the current state-of-the-art algorithm is prohibitively large.
AB - Accurate, timely Root Cause Analysis (RCA) is essential to successful IT operations as a primary step to incident remediation. RCA automation using data mining techniques in large heterogeneous systems is, however, a challenging task, because it requires correlating multimodal information across various data sources. An increasing number of services are migrating to structured logging to enable automated monitoring and debugging of complex large-scale systems. In this paper, we leverage structured logs and association rule mining (ARM) to automate RCA. We propose the LogRule algorithm, which automatically analyzes structured logs to generate a list of explanations for an event of interest. It achieves 0.921 F1-score for the diagnosis task, while computing results 37x faster compared to the state-of-the-art solution based on FP-growth, making it a time-efficient, accurate, and interpretable ARM-based RCA algorithm. Evaluation results show that LogRule enables RCA in complex multidimensional datasets, where the execution time of the current state-of-the-art algorithm is prohibitively large.
KW - AIOps
KW - Root cause analysis
KW - data mining
KW - large-scale computing environment
UR - http://www.scopus.com/inward/record.url?scp=85160999692&partnerID=8YFLogxK
U2 - 10.1109/TNSM.2023.3282270
DO - 10.1109/TNSM.2023.3282270
M3 - Article
AN - SCOPUS:85160999692
SN - 1932-4537
VL - 20
SP - 4231
EP - 4243
JO - IEEE Transactions on Network and Service Management
JF - IEEE Transactions on Network and Service Management
IS - 4
M1 - 3282270
ER -