On-Demand State Separation for Cloud Data Warehousing

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

Moving data analysis and processing to the cloud is no longer reserved for a few companies with petabytes of data. Instead, the flexibility of on-demand resources is attracting an increasing number of customers with small to medium-sized workloads. These workloads do not occupy entire clusters but can run on single worker machines. However, picking the right worker for the job is challenging. Abstracting from worker machines, e.g., using stateless architectures, introduces overheads impacting performance. Solutions without stateless architectures resort to query restarts in the event of an adverse worker matching, wasting already achieved progress. In this paper, we propose migrating queries between workers by introducing on-demand state separation. Using state separation only when required enables maximum flexibility and performance while keeping already achieved progress. To derive the requirements for state separation, we first analyze the query state of medium-sized workloads on the example of TPC-DS SF100. Using this, we analyze the cost and describe the constraints necessary for state separation on such a workload. Furthermore, we describe the design and implementation of on-demand state separation in a compiling database system. Finally, using this implementation, we show the feasibility of our approach on TPC-DS and give a detailed analysis of the cost of query migration and state separation.

Original languageEnglish
Pages (from-to)2966-2979
Number of pages14
JournalProceedings of the VLDB Endowment
Volume15
Issue number11
DOIs
StatePublished - 2022
Event48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australia
Duration: 5 Sep 20229 Sep 2022

Fingerprint

Dive into the research topics of 'On-Demand State Separation for Cloud Data Warehousing'. Together they form a unique fingerprint.

Cite this