Just can't get enough - Synthesizing big data

Tilmann Rabl, Manuel Danisch, Michael Frank, Hans Arno Jacobsen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

With the rapidly decreasing prices for storage and storage systems ever larger data sets become economical. While only few years ago only successful transactions would be recorded in sales systems, today every user interaction will be stored for ever deeper analysis and richer user modeling. This has led to the development of big data systems, which offer high scalability and novel forms of analysis. Due to the rapid development and ever increasing variety of the big data landscape, there is a pressing need for tools for testing and benchmarking. Vendors have little options to showcase the performance of their systems but to use trivial data sets like TeraSort or WordCount. Since customers' real data is typically subject to privacy regulations and rarely can be utilized, simplistic proof-of-concepts have to be used, leaving both, customers and vendors, unclear of the target use-case performance. As a solution, we present an automatic approach to data synthetization from existing data sources. Our system enables a fully automatic generation of large amounts of complex, realistic, synthetic data.

Original languageEnglish
Title of host publicationSIGMOD 2015 - Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages1457-1462
Number of pages6
ISBN (Electronic)9781450327589
DOIs
StatePublished - 27 May 2015
Externally publishedYes
EventACM SIGMOD International Conference on Management of Data, SIGMOD 2015 - Melbourne, Australia
Duration: 31 May 20154 Jun 2015

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
Volume2015-May
ISSN (Print)0730-8078

Conference

ConferenceACM SIGMOD International Conference on Management of Data, SIGMOD 2015
Country/TerritoryAustralia
CityMelbourne
Period31/05/154/06/15

Fingerprint

Dive into the research topics of 'Just can't get enough - Synthesizing big data'. Together they form a unique fingerprint.

Cite this