RDF-3X: A risc-style engine for RDF

Thomas Neumann, Gerhard Weikum

Research output: Contribution to journalArticlepeer-review

455 Scopus citations

Abstract

RDF is a data representation format for schema-free structured information that is gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The "pay-as-you-go" nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper presents the RDF-3X engine, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with a streamlined architecture and carefully designed, puristic data structures and operations. The salient points of RDF-3X are: 1) a generic solution for storing and indexing RDF triples that completely eliminates the need for physical-design tuning, 2) a powerful yet simple query processor that leverages fast merge joins to the largest possible extent, and 3) a query optimizer for choosing optimal join orders using a cost model based on statistical synopses for entire join paths. The performance of RDF-3X, in comparison to the previously best state-of-the-art systems, has been measured on several large-scale datasets with more than 50 million RDF triples and benchmark queries that include pattern matching and long join paths in the underlying data graphs.

Original languageEnglish
Pages (from-to)647-659
Number of pages13
JournalProceedings of the VLDB Endowment
Volume1
Issue number1
DOIs
StatePublished - 2008
Externally publishedYes

Fingerprint

Dive into the research topics of 'RDF-3X: A risc-style engine for RDF'. Together they form a unique fingerprint.

Cite this