Subspace clustering for indexing high dimensional data: A main memory index based on local reductions and individual multi-representations

Stephan Günnemann, Hardy Kremer, Dominik Lenhard, Thomas Seidl

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

Fast similarity search in high dimensional feature spaces is crucial in today's applications. Since the performance of traditional index structures degrades with increasing dimensionality, concepts were developed to cope with this curse of dimensionality. Most of the existing concepts exploit global correlations between dimensions to reduce the dimensionality of the feature space. In high dimensional data, however, correlations are often locally constrained to a subset of the data and every object can participate in several of these correlations. Accordingly, discarding the same set of dimensions for each object based on global correlations and ignoring the different correlations of single objects leads to significant loss of information. These aspects are relevant due to the direct correspondence between the degree of information preserved and the achievable query performance. We introduce a novel main memory index structure with increased information content for each single object compared to a global approach. This is achieved by using individual dimensions for each data object by applying the method of subspace clustering. The structure of our index is based on a multi-representation of objects reflecting their multiple correlations; that is, besides the general increase of information per object, we provide several individual representations for each single data object. These multiple views correspond to different local reductions per object and enable more effective pruning. In thorough experiments on real and synthetic data, we demonstrate that our novel solution achieves low query times and outperforms existing approaches designed for high dimensional data.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT 2011
Subtitle of host publication14th International Conference on Extending Database Technology, Proceedings
PublisherAssociation for Computing Machinery
Pages237-248
Number of pages12
ISBN (Print)9781450305280
DOIs
StatePublished - 2011
Externally publishedYes
Event14th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2011 - Uppsala, Sweden
Duration: 22 Mar 201124 Mar 2011

Publication series

NameACM International Conference Proceeding Series

Conference

Conference14th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2011
Country/TerritorySweden
CityUppsala
Period22/03/1124/03/11

Keywords

  • Algorithms
  • Performance

Fingerprint

Dive into the research topics of 'Subspace clustering for indexing high dimensional data: A main memory index based on local reductions and individual multi-representations'. Together they form a unique fingerprint.

Cite this