TY - GEN
T1 - Big data generation
AU - Rabl, Tilmann
AU - Jacobsen, Hans Arno
PY - 2014
Y1 - 2014
N2 - Big data challenges are end-to-end problems. When handling big data it usually has to be preprocessed, moved, loaded, processed, and stored many times. This has led to the creation of big data pipelines. Current benchmarks related to big data only focus on isolated aspects of this pipeline, usually the processing, storage and loading aspects. To this date, there has not been any benchmark presented covering the end-to-end aspect for big data systems. In this paper, we discuss the necessity of ETL like tasks in big data benchmarking and propose the Parallel Data Generation Framework (PDGF) for its data generation. PDGF is a generic data generator that was implemented at the University of Passau and is currently adopted in TPC benchmarks.
AB - Big data challenges are end-to-end problems. When handling big data it usually has to be preprocessed, moved, loaded, processed, and stored many times. This has led to the creation of big data pipelines. Current benchmarks related to big data only focus on isolated aspects of this pipeline, usually the processing, storage and loading aspects. To this date, there has not been any benchmark presented covering the end-to-end aspect for big data systems. In this paper, we discuss the necessity of ETL like tasks in big data benchmarking and propose the Parallel Data Generation Framework (PDGF) for its data generation. PDGF is a generic data generator that was implemented at the University of Passau and is currently adopted in TPC benchmarks.
UR - http://www.scopus.com/inward/record.url?scp=84958546415&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-53974-9_3
DO - 10.1007/978-3-642-53974-9_3
M3 - Conference contribution
AN - SCOPUS:84958546415
SN - 9783642539732
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 20
EP - 27
BT - Specifying Big Data Benchmarks - First Workshop, WBDB 2012, and Second Workshop, WBDB 2012, Revised Selected Papers
PB - Springer Verlag
T2 - 2nd Workshop on Specifying Big Data Benchmarks, WBDB 2012
Y2 - 17 December 2012 through 18 December 2012
ER -