Scalable join processing on very large RDF graphs

Thomas Neumann, Gerhard Weikum

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

163 Scopus citations

Abstract

With the proliferation of the RDF data format, engines for RDF query processing are faced with very large graphs that contain hundreds of millions of RDF triples. This paper addresses the resulting scalability problems. Recent prior work along these lines has focused on indexing and other physical-design issues. The current paper focuses on join processing, as the ffne-grained and schema-relaxed use of RDF often entails star- and chain-shaped join queries with many input streams from index scans. We present two contributions for scalable join processing. First, we develop very light-weight methods for sideways in- formation passing between separate joins at query run-time, to provide highly effective fflters on the input streams of joins. Second, we improve previously proposed algorithms for join-order optimization by more accurate selectivity esti- mations for very large RDF graphs. Experimental studies with several RDF datasets, including the UniProt collection, demonstrate the performance gains of our approach, outper- forming the previously fastest systems by more than an order of magnitude.

Original languageEnglish
Title of host publicationSIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
Pages627-639
Number of pages13
DOIs
StatePublished - 2009
Externally publishedYes
EventInternational Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09 - Providence, RI, United States
Duration: 29 Jun 20092 Jul 2009

Publication series

NameSIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems

Conference

ConferenceInternational Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09
Country/TerritoryUnited States
CityProvidence, RI
Period29/06/092/07/09

Fingerprint

Dive into the research topics of 'Scalable join processing on very large RDF graphs'. Together they form a unique fingerprint.

Cite this