Assessing the significance of data mining results on graphs with feature vectors

Stephan Günnemann, Phuong Dao, Mohsen Jamali, Martin Ester

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Assessing the significance of data mining results is an important step in the knowledge discovery process. While results might appear interesting at a first glance, they can often be explained by already known characteristics of the data. Randomization is an established technique for significance testing, and methods to assess data mining results on vector data or network data have been proposed. In many applications, however, both sources are simultaneously given. Since these sources are rarely independent of each other but highly correlated, naively applying existing randomization methods on each source separately is questionable. In this work, we present a method to assess the significance of mining results on graphs with binary features vectors. We propose a novel null model that preserves correlation information between both sources. Our randomization exploits an adaptive Metropolis sampling and interweaves attribute randomization and graph randomization steps. In thorough experiments, we demonstrate the application of our technique. Our results indicate that while simultaneously using both sources is beneficial, often one source of information is dominant for determining the mining results.

Original languageEnglish
Title of host publicationProceedings - 12th IEEE International Conference on Data Mining, ICDM 2012
Pages270-279
Number of pages10
DOIs
StatePublished - 2012
Externally publishedYes
Event12th IEEE International Conference on Data Mining, ICDM 2012 - Brussels, Belgium
Duration: 10 Dec 201213 Dec 2012

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference12th IEEE International Conference on Data Mining, ICDM 2012
Country/TerritoryBelgium
CityBrussels
Period10/12/1213/12/12

Fingerprint

Dive into the research topics of 'Assessing the significance of data mining results on graphs with feature vectors'. Together they form a unique fingerprint.

Cite this