TY - GEN
T1 - Scalable Infrastructure and Workflow for Anomaly Detection in an Automotive Industry
AU - Jindal, Anshul
AU - Gerndt, Michael
AU - Bauch, Mario
AU - Haddouti, Hachim
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/2
Y1 - 2020/2
N2 - Anomalies are unexpected instances which significantly deviate from the normal patterns formed by the majority of a dataset. The more an observation deviate from the normal pattern, the more likely it is an anomaly. The continuous increase in the number of car models and configuration possibilities has led to continuous increase in the complexity of logistics supply chain and production. Consequently, it has become difficult to manage the whole IT Landscape, a small anomaly/failure somewhere in the system could lead to a huge loss of money. Therefore, to identify and ultimately resolve quickly a problem in such a system is highly important. This paper addresses the challenge of identifying anomalies in a scalable way. The new data collected suffers from the problem of lack of labels for training. This challenge is addressed in the developed solution by using multiple unsupervised algorithms and reporting those observation as anomalies which are commonly reported as anomalies by all the algorithms. The developed solution also tackles the problem of data heterogeneity and big size by using Spark underneath for scalable data processing. Scalability test results demonstrate the reduction in training time of 100 transactions by 80% when using 10 cores instead of using 1 core. The results of the study have also pointed out that increasing the number of cores does not necessarily means reduction in the overall execution time, there are other factors like communications between the cores, non-spark related processing tasks, etc which can also influence the execution time.
AB - Anomalies are unexpected instances which significantly deviate from the normal patterns formed by the majority of a dataset. The more an observation deviate from the normal pattern, the more likely it is an anomaly. The continuous increase in the number of car models and configuration possibilities has led to continuous increase in the complexity of logistics supply chain and production. Consequently, it has become difficult to manage the whole IT Landscape, a small anomaly/failure somewhere in the system could lead to a huge loss of money. Therefore, to identify and ultimately resolve quickly a problem in such a system is highly important. This paper addresses the challenge of identifying anomalies in a scalable way. The new data collected suffers from the problem of lack of labels for training. This challenge is addressed in the developed solution by using multiple unsupervised algorithms and reporting those observation as anomalies which are commonly reported as anomalies by all the algorithms. The developed solution also tackles the problem of data heterogeneity and big size by using Spark underneath for scalable data processing. Scalability test results demonstrate the reduction in training time of 100 transactions by 80% when using 10 cores instead of using 1 core. The results of the study have also pointed out that increasing the number of cores does not necessarily means reduction in the overall execution time, there are other factors like communications between the cores, non-spark related processing tasks, etc which can also influence the execution time.
KW - anomaly detection
KW - scalable
KW - scalable anomaly detection
KW - spark
KW - timeseries
UR - http://www.scopus.com/inward/record.url?scp=85084304648&partnerID=8YFLogxK
U2 - 10.1109/ICITIIT49094.2020.9071555
DO - 10.1109/ICITIIT49094.2020.9071555
M3 - Conference contribution
AN - SCOPUS:85084304648
T3 - 2020 International Conference on Innovative Trends in Information Technology, ICITIIT 2020
BT - 2020 International Conference on Innovative Trends in Information Technology, ICITIIT 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 International Conference on Innovative Trends in Information Technology, ICITIIT 2020
Y2 - 13 February 2020 through 14 February 2020
ER -