TY - GEN
T1 - An online incremental clustering framework for real-time stream analytics
AU - Salort Sanchez, Carlos
AU - Tudoran, Radu
AU - Al Hajj Hassan, Mohamad
AU - Bortoli, Stefano
AU - Brasche, Goetz
AU - Baumbach, Jan
AU - Axenie, Cristian
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - With the evolution of data acquisition methods, our ability to collect real time data has increased. This requires the development of real-time analytics, using the most recent data to generate valuable insights. One example is customer profiling, where we want to identify groups of similar clients who were active recently, and improve the quality of the suggestions. Traditional clustering algorithms perform well on finite datasets, but their execution is often not compatible with real-time requirements, especially for rapid changing trends. In this context, we propose a novel approach for the definition of incremental clustering algorithms to work within real-time constraints, in an online fashion, while preserving accuracy. We show the general applicability of the framework by employing this method to three different clustering algorithms. We compare the experimental results between traditional and online approaches evaluating accuracy and computational cost. The results show that algorithms executed in our framework are comparable to their offline implementation in terms of accuracy and with a high gain in execution time, up to three orders of magnitude on average.
AB - With the evolution of data acquisition methods, our ability to collect real time data has increased. This requires the development of real-time analytics, using the most recent data to generate valuable insights. One example is customer profiling, where we want to identify groups of similar clients who were active recently, and improve the quality of the suggestions. Traditional clustering algorithms perform well on finite datasets, but their execution is often not compatible with real-time requirements, especially for rapid changing trends. In this context, we propose a novel approach for the definition of incremental clustering algorithms to work within real-time constraints, in an online fashion, while preserving accuracy. We show the general applicability of the framework by employing this method to three different clustering algorithms. We compare the experimental results between traditional and online approaches evaluating accuracy and computational cost. The results show that algorithms executed in our framework are comparable to their offline implementation in terms of accuracy and with a high gain in execution time, up to three orders of magnitude on average.
KW - Data Stream
KW - Data Stream Clustering
KW - Online Clustering
KW - Online Learning
UR - http://www.scopus.com/inward/record.url?scp=85080909949&partnerID=8YFLogxK
U2 - 10.1109/ICMLA.2019.00243
DO - 10.1109/ICMLA.2019.00243
M3 - Conference contribution
AN - SCOPUS:85080909949
T3 - Proceedings - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019
SP - 1480
EP - 1485
BT - Proceedings - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019
A2 - Wani, M. Arif
A2 - Khoshgoftaar, Taghi M.
A2 - Wang, Dingding
A2 - Wang, Huanjing
A2 - Seliya, Naeem
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019
Y2 - 16 December 2019 through 19 December 2019
ER -