Analysis of TPC-DS - The first standard benchmark for SQL-based big data systems

Meikel Poess, Tilmann Rabl, Hans Arno Jacobsen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

27 Scopus citations

Abstract

The advent of Web 2.0 companies, such as Facebook, Google, and Amazon with their insatiable appetite for vast amounts of structured, semi-structured, and unstructured data, triggered the development of Hadoop and related tools, e.g., YARN, MapReduce, and Pig, as well as NoSQL databases. These tools form an open source software stack to support the processing of large and diverse data sets on clustered systems to perform decision support tasks. Recently, SQL is resurrecting in many of these solutions, e.g., Hive, Stinger, Impala, Shark, and Presto. At the same time, RDBMS vendors are adding Hadoop support into their SQL engines, e.g., IBM's Big SQL, Actian's Vortex, Oracle's Big Data SQL, and SAP's HANA. Because there was no industry standard benchmark that could measure the performance of SQL-based big data solutions, marketing claims were mostly based on "cherry picked" subsets of the TPC-DS benchmark to suit individual companies strengths, while blending out their weaknesses. In this paper, we present and analyze our work on modifying TPC-DS to fill the void for an industry standard benchmark that is able to measure the performance of SQL-based big data solutions. The new benchmark was ratified by the TPC in early 2016. To show the significance of the new benchmark, we analyze performance data obtained on four different systems running big data, traditional RDBMS, and columnar in-memory architectures.

Original languageEnglish
Title of host publicationSoCC 2017 - Proceedings of the 2017 Symposium on Cloud Computing
PublisherAssociation for Computing Machinery, Inc
Pages573-585
Number of pages13
ISBN (Electronic)9781450350280
DOIs
StatePublished - 24 Sep 2017
Event2017 Symposium on Cloud Computing, SoCC 2017 - Santa Clara, United States
Duration: 24 Sep 201727 Sep 2017

Publication series

NameSoCC 2017 - Proceedings of the 2017 Symposium on Cloud Computing

Conference

Conference2017 Symposium on Cloud Computing, SoCC 2017
Country/TerritoryUnited States
CitySanta Clara
Period24/09/1727/09/17

Keywords

  • Benchmark
  • Big data
  • TPC-DS

Fingerprint

Dive into the research topics of 'Analysis of TPC-DS - The first standard benchmark for SQL-based big data systems'. Together they form a unique fingerprint.

Cite this