Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling

Lars Heling, Maribel Acosta

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

RDF dataset profiles provide a formal representation of a dataset’s characteristics (features). These profiles may cover various aspects of the data represented in the dataset as well as statistical descriptors of the data distribution. In this work, we focus on the characteristic sets profile feature summarizing the characteristic sets contained in an RDF graph. As this type of feature provides detailed information on both the structure and semantics of RDF graphs, they can be very beneficial in query optimization. However, in decentralized query processing, computing them is challenging as it is difficult and/or costly to access and process all datasets. To overcome this shortcoming, we propose the concept of a profile feature estimation. We present sampling methods and projection functions to generate estimations which aim to be as similar as possible to the original characteristic sets profile feature. In our evaluation, we investigate the feasibility of the proposed methods on four RDF graphs. Our results show that samples containing 0.5% of the entities in the graph allow for good estimations and may be used by downstream tasks such as query plan optimization in decentralized querying.

Original languageEnglish
Title of host publicationThe Semantic Web - 17th International Conference, ESWC 2020, Proceedings
EditorsAndreas Harth, Sabrina Kirrane, Axel-Cyrille Ngonga Ngomo, Heiko Paulheim, Anisa Rula, Anna Lisa Gentile, Peter Haase, Michael Cochez
PublisherSpringer
Pages157-175
Number of pages19
ISBN (Print)9783030494605
DOIs
StatePublished - 2020
Externally publishedYes
Event17th Extended Semantic Web Conference, ESWC 2020 - Heraklion, Greece
Duration: 31 May 20204 Jun 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12123 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th Extended Semantic Web Conference, ESWC 2020
Country/TerritoryGreece
CityHeraklion
Period31/05/204/06/20

Fingerprint

Dive into the research topics of 'Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling'. Together they form a unique fingerprint.

Cite this