TY - GEN
T1 - Scalable Infrastructure for Workload Characterization of Cluster Traces
AU - van Loo, Thomas
AU - Jindal, Anshul
AU - Benedict, Shajulin
AU - Chadha, Mohak
AU - Gerndt, Michael
N1 - Publisher Copyright:
Copyright © 2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.
PY - 2022
Y1 - 2022
N2 - In the recent past, characterizing workloads has been attempted to gain a foothold in the emerging serverless cloud market, especially in the large production cloud clusters of Google, AWS, and so forth. While analyzing and characterizing real workloads from a large production cloud cluster benefits cloud providers, researchers, and daily users, analyzing the workload traces of these clusters has been an arduous task due to the heterogeneous nature of data. This article proposes a scalable infrastructure based on Google’s dataproc for analyzing the workload traces of cloud environments. We evaluated the functioning of the proposed infrastructure using the workload traces of Google cloud cluster-usage-traces-v3. We perform the workload characterization on this dataset, focusing on the heterogeneity of the workload, the variations in job durations, aspects of resources consumption, and the overall availability of resources provided by the cluster. The findings reported in the paper will be beneficial for cloud infrastructure providers and users while managing the cloud computing resources, especially serverless platforms.
AB - In the recent past, characterizing workloads has been attempted to gain a foothold in the emerging serverless cloud market, especially in the large production cloud clusters of Google, AWS, and so forth. While analyzing and characterizing real workloads from a large production cloud cluster benefits cloud providers, researchers, and daily users, analyzing the workload traces of these clusters has been an arduous task due to the heterogeneous nature of data. This article proposes a scalable infrastructure based on Google’s dataproc for analyzing the workload traces of cloud environments. We evaluated the functioning of the proposed infrastructure using the workload traces of Google cloud cluster-usage-traces-v3. We perform the workload characterization on this dataset, focusing on the heterogeneity of the workload, the variations in job durations, aspects of resources consumption, and the overall availability of resources provided by the cluster. The findings reported in the paper will be beneficial for cloud infrastructure providers and users while managing the cloud computing resources, especially serverless platforms.
KW - Cloud Computing
KW - Dataproc
KW - Google Cloud
KW - Google Cluster Traces
KW - Scalable
KW - Workload Characterization
UR - http://www.scopus.com/inward/record.url?scp=85141080655&partnerID=8YFLogxK
U2 - 10.5220/0011080300003200
DO - 10.5220/0011080300003200
M3 - Conference contribution
AN - SCOPUS:85141080655
T3 - International Conference on Cloud Computing and Services Science, CLOSER - Proceedings
SP - 254
EP - 263
BT - Proceedings of the 12th International Conference on Cloud Computing and Services Science, CLOSER 2022
A2 - van Steen, Maarten
A2 - Ferguson, Donald
A2 - Pahl, Claus
PB - Science and Technology Publications, Lda
T2 - 12th International Conference on Cloud Computing and Services Science, CLOSER 2022
Y2 - 27 April 2022 through 29 April 2022
ER -