Skip to main navigation Skip to search Skip to main content

TPCDI: The first industry benchmark for data integration

  • Oracle America, Inc.
  • University of Toronto
  • IBM Research Austin

Research output: Contribution to journalConference articlepeer-review

68 Scopus citations

Abstract

Historically, the process of synchronizing a decision support system with data from operational systems has been referred to as Extract, Transform, Load (ETL) and the tools supporting such process have been referred to as ETL tools. Recently, ETL was replaced by the more comprehensive acronym, data integration (DI). DI describes the process of extracting and combining data from a variety of data source formats, transforming that data into a unified data model representation and loading it into a data store. This is done in the context of a variety of scenarios, such as data acquisition for business intelligence, analytics and data warehousing, but also synchronization of data between operational applications, data migrations and conversions, master data management, enterprise data sharing and delivery of data services in a service-oriented architecture context, amongst others. With these scenarios relying on up-to-date information it is critical to implement a highly performing, scalable and easy to maintain data integration system. This is especially important as the complexity, variety and volume of data is constantly increasing and performance of data integration systems is becoming very critical. Despite the significance of having a highly performing DI system, there has been no industry standard for measuring and comparing their performance. The TPC, acknowledging this void, has released TPC-DI, an innovative benchmark for data integration. This paper motivates the reasons behind its development, describes its main characteristics including workload, run rules, metric, and explains key decisions.

Original languageEnglish
Pages (from-to)1367-1378
Number of pages12
JournalProceedings of the VLDB Endowment
Volume7
Issue number13
DOIs
StatePublished - 2014
Externally publishedYes
EventProceedings of the 40th International Conference on Very Large Data Bases, VLDB 2014 - Hangzhou, China
Duration: 1 Sep 20145 Sep 2014

Fingerprint

Dive into the research topics of 'TPCDI: The first industry benchmark for data integration'. Together they form a unique fingerprint.

Cite this