Generalized density attractor clustering for incomplete data

Richard Leibrandt, Stephan Günnemann

Publikation: Beitrag in FachzeitschriftArtikelBegutachtung

1 Zitat (Scopus)

Abstract

Mean shift is a popular and powerful clustering method for implementing density attractor clustering (DAC). However, DAC is underdeveloped in terms of modeling definitions and methods for incomplete data. Due to DAC’s importance, solving this common issue is crucial. This work makes DAC more versatile by making it applicable to incomplete data: First, using formal modeling definitions, we propose a unifying framework for DAC. Second, we propose new methods that implement the definitions and perform DAC for incomplete data more efficiently and stably than others. We discuss and compare our methods and the closest competitor using theoretical analyses. We quantify the performance of our methods using synthetic datasets with known structures and real-life business data for three missing value types. Finally, we analyze Stack Overflow’s 2021 survey to extract clusters of programmers from India and the USA. The experiments verify our methods’ superiority to six alternatives. Code, Data:https://bit.ly/genDAC

OriginalspracheEnglisch
Seiten (von - bis)970-1009
Seitenumfang40
FachzeitschriftData Mining and Knowledge Discovery
Jahrgang37
Ausgabenummer2
DOIs
PublikationsstatusVeröffentlicht - März 2023

Fingerprint

Untersuchen Sie die Forschungsthemen von „Generalized density attractor clustering for incomplete data“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren