TY - GEN
T1 - Scalable join processing on very large RDF graphs
AU - Neumann, Thomas
AU - Weikum, Gerhard
PY - 2009
Y1 - 2009
N2 - With the proliferation of the RDF data format, engines for RDF query processing are faced with very large graphs that contain hundreds of millions of RDF triples. This paper addresses the resulting scalability problems. Recent prior work along these lines has focused on indexing and other physical-design issues. The current paper focuses on join processing, as the ffne-grained and schema-relaxed use of RDF often entails star- and chain-shaped join queries with many input streams from index scans. We present two contributions for scalable join processing. First, we develop very light-weight methods for sideways in- formation passing between separate joins at query run-time, to provide highly effective fflters on the input streams of joins. Second, we improve previously proposed algorithms for join-order optimization by more accurate selectivity esti- mations for very large RDF graphs. Experimental studies with several RDF datasets, including the UniProt collection, demonstrate the performance gains of our approach, outper- forming the previously fastest systems by more than an order of magnitude.
AB - With the proliferation of the RDF data format, engines for RDF query processing are faced with very large graphs that contain hundreds of millions of RDF triples. This paper addresses the resulting scalability problems. Recent prior work along these lines has focused on indexing and other physical-design issues. The current paper focuses on join processing, as the ffne-grained and schema-relaxed use of RDF often entails star- and chain-shaped join queries with many input streams from index scans. We present two contributions for scalable join processing. First, we develop very light-weight methods for sideways in- formation passing between separate joins at query run-time, to provide highly effective fflters on the input streams of joins. Second, we improve previously proposed algorithms for join-order optimization by more accurate selectivity esti- mations for very large RDF graphs. Experimental studies with several RDF datasets, including the UniProt collection, demonstrate the performance gains of our approach, outper- forming the previously fastest systems by more than an order of magnitude.
UR - http://www.scopus.com/inward/record.url?scp=70849136081&partnerID=8YFLogxK
U2 - 10.1145/1559845.1559911
DO - 10.1145/1559845.1559911
M3 - Conference contribution
AN - SCOPUS:70849136081
SN - 9781605585543
T3 - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
SP - 627
EP - 639
BT - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
T2 - International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09
Y2 - 29 June 2009 through 2 July 2009
ER -