Integration of apache spark with invasive resource manager

Jeeta Ann Chacko, Isaías A.Comprés Ureña, Michael Gerndt

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The scheduling of resources on High-Performance Computing systems (HPC) for compute-intensive scientific applications often results in idle nodes at different points in time due to the difference in node requirements for each application. One option to optimize node usage is to assign all idle nodes to data analytics applications. Scientific applications generate data as output which can be used as input for data analytics applications. So it would be beneficial if both types of applications can be run on the same HPC system. The Invasive Resource Manager (IRM) is an extension of the Simple Linux Utility Resource Manager (SLURM) with dynamic resource management and scheduling capabilities on HPC systems and Apache Spark is an open-source cluster computing framework that is widely used for data analytics applications. This project integrates Apache Spark with the IRM so that data analytics applications can be run on HPC systems with dynamic resource allocation. This work also collects performance data from Spark applications and improves the existing scheduling strategy of the IRM. The integrated system is deployed on SuperMUC, the supercomputer at the Leibniz Supercomputing Centre in Germany for testing and evaluation. This project illustrates the design for integrating data analytics on HPC systems with the additional advantage of improving resource utilization. The evaluation shows complete utilization of idle nodes by Spark applications.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Internet of People and Smart City Innovation, SmartWorld/UIC/ATC/SCALCOM/IOP/SCI 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1553-1560
Number of pages8
ISBN (Electronic)9781728140346
DOIs
StatePublished - Aug 2019
Event2019 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Internet of People and Smart City Innovation, SmartWorld/UIC/ATC/SCALCOM/IOP/SCI 2019 - Leicester, United Kingdom
Duration: 19 Aug 201923 Aug 2019

Publication series

NameProceedings - 2019 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Internet of People and Smart City Innovation, SmartWorld/UIC/ATC/SCALCOM/IOP/SCI 2019

Conference

Conference2019 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Internet of People and Smart City Innovation, SmartWorld/UIC/ATC/SCALCOM/IOP/SCI 2019
Country/TerritoryUnited Kingdom
CityLeicester
Period19/08/1923/08/19

Keywords

  • Apache Spark
  • Data Analytics
  • Dynamic Resource Allocation
  • Elastic Scheduling
  • High Performance Computing

Fingerprint

Dive into the research topics of 'Integration of apache spark with invasive resource manager'. Together they form a unique fingerprint.

Cite this