TY - GEN
T1 - Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling
AU - Heling, Lars
AU - Acosta, Maribel
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - RDF dataset profiles provide a formal representation of a dataset’s characteristics (features). These profiles may cover various aspects of the data represented in the dataset as well as statistical descriptors of the data distribution. In this work, we focus on the characteristic sets profile feature summarizing the characteristic sets contained in an RDF graph. As this type of feature provides detailed information on both the structure and semantics of RDF graphs, they can be very beneficial in query optimization. However, in decentralized query processing, computing them is challenging as it is difficult and/or costly to access and process all datasets. To overcome this shortcoming, we propose the concept of a profile feature estimation. We present sampling methods and projection functions to generate estimations which aim to be as similar as possible to the original characteristic sets profile feature. In our evaluation, we investigate the feasibility of the proposed methods on four RDF graphs. Our results show that samples containing 0.5% of the entities in the graph allow for good estimations and may be used by downstream tasks such as query plan optimization in decentralized querying.
AB - RDF dataset profiles provide a formal representation of a dataset’s characteristics (features). These profiles may cover various aspects of the data represented in the dataset as well as statistical descriptors of the data distribution. In this work, we focus on the characteristic sets profile feature summarizing the characteristic sets contained in an RDF graph. As this type of feature provides detailed information on both the structure and semantics of RDF graphs, they can be very beneficial in query optimization. However, in decentralized query processing, computing them is challenging as it is difficult and/or costly to access and process all datasets. To overcome this shortcoming, we propose the concept of a profile feature estimation. We present sampling methods and projection functions to generate estimations which aim to be as similar as possible to the original characteristic sets profile feature. In our evaluation, we investigate the feasibility of the proposed methods on four RDF graphs. Our results show that samples containing 0.5% of the entities in the graph allow for good estimations and may be used by downstream tasks such as query plan optimization in decentralized querying.
UR - http://www.scopus.com/inward/record.url?scp=85086144610&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-49461-2_10
DO - 10.1007/978-3-030-49461-2_10
M3 - Conference contribution
AN - SCOPUS:85086144610
SN - 9783030494605
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 157
EP - 175
BT - The Semantic Web - 17th International Conference, ESWC 2020, Proceedings
A2 - Harth, Andreas
A2 - Kirrane, Sabrina
A2 - Ngonga Ngomo, Axel-Cyrille
A2 - Paulheim, Heiko
A2 - Rula, Anisa
A2 - Gentile, Anna Lisa
A2 - Haase, Peter
A2 - Cochez, Michael
PB - Springer
T2 - 17th Extended Semantic Web Conference, ESWC 2020
Y2 - 31 May 2020 through 4 June 2020
ER -