Abstract
Mean shift is a popular and powerful clustering method for implementing density attractor clustering (DAC). However, DAC is underdeveloped in terms of modeling definitions and methods for incomplete data. Due to DAC’s importance, solving this common issue is crucial. This work makes DAC more versatile by making it applicable to incomplete data: First, using formal modeling definitions, we propose a unifying framework for DAC. Second, we propose new methods that implement the definitions and perform DAC for incomplete data more efficiently and stably than others. We discuss and compare our methods and the closest competitor using theoretical analyses. We quantify the performance of our methods using synthetic datasets with known structures and real-life business data for three missing value types. Finally, we analyze Stack Overflow’s 2021 survey to extract clusters of programmers from India and the USA. The experiments verify our methods’ superiority to six alternatives. Code, Data:https://bit.ly/genDAC
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | 970-1009 |
Seitenumfang | 40 |
Fachzeitschrift | Data Mining and Knowledge Discovery |
Jahrgang | 37 |
Ausgabenummer | 2 |
DOIs | |
Publikationsstatus | Veröffentlicht - März 2023 |