Model-Based Performance Evaluation of Batch and Stream Applications for Big Data

Johannes Krob, Helmut Krcmar

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

Batch and stream processing represent the two main approaches implemented by big data systems such as Apache Spark and Apache Flink. Although only stream applications are intended to satisfy real-time requirements, both approaches are required to meet certain response time constraints. In addition, cluster architectures continuously expand and computing resources constitute high investments and expenses for organizations. Therefore, planning required capacities and predicting response times is crucial. In this work, we present a performance modeling and simulation approach by using and extending the Palladio component model. We predict performance metrics of batch and stream applications and its underlying processing systems by the example of Apache Spark on Apache Hadoop. Whereas most related work concentrates on one specific processing technique and focuses on the metric response time, we propose a general approach and consider the utilization of resources as well. In different experiments we evaluated our approach using applications and data workloads of the HiBench benchmark suite. The results indicate accurate predictions for upscaling cluster sizes as well as workloads with errors less than 18%.

Original languageEnglish
Title of host publicationProceedings - 25th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages80-86
Number of pages7
ISBN (Electronic)9781538627631
DOIs
StatePublished - 13 Nov 2017
Event25th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2017 - Banff, Canada
Duration: 20 Sep 201722 Sep 2017

Publication series

NameProceedings - 25th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2017

Conference

Conference25th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2017
Country/TerritoryCanada
CityBanff
Period20/09/1722/09/17

Keywords

  • big data
  • modeling
  • performance
  • simulation

Fingerprint

Dive into the research topics of 'Model-Based Performance Evaluation of Batch and Stream Applications for Big Data'. Together they form a unique fingerprint.

Cite this