Generalized density attractor clustering for incomplete data

Richard Leibrandt, Stephan Günnemann

Research output: Contribution to journalArticlepeer-review

Abstract

Mean shift is a popular and powerful clustering method for implementing density attractor clustering (DAC). However, DAC is underdeveloped in terms of modeling definitions and methods for incomplete data. Due to DAC’s importance, solving this common issue is crucial. This work makes DAC more versatile by making it applicable to incomplete data: First, using formal modeling definitions, we propose a unifying framework for DAC. Second, we propose new methods that implement the definitions and perform DAC for incomplete data more efficiently and stably than others. We discuss and compare our methods and the closest competitor using theoretical analyses. We quantify the performance of our methods using synthetic datasets with known structures and real-life business data for three missing value types. Finally, we analyze Stack Overflow’s 2021 survey to extract clusters of programmers from India and the USA. The experiments verify our methods’ superiority to six alternatives. Code, Data:https://bit.ly/genDAC

Original languageEnglish
Pages (from-to)970-1009
Number of pages40
JournalData Mining and Knowledge Discovery
Volume37
Issue number2
DOIs
StatePublished - Mar 2023

Keywords

  • Clustering
  • Incomplete datasets
  • Kernel density estimation
  • Missing values

Fingerprint

Dive into the research topics of 'Generalized density attractor clustering for incomplete data'. Together they form a unique fingerprint.

Cite this