TY - GEN
T1 - A comparative study of high-performance computing on the cloud
AU - Marathe, Aniruddha
AU - Harris, Rachel
AU - Lowenthal, David K.
AU - De Supinski, Bronis R.
AU - Rountree, Barry
AU - Schulz, Martin
AU - Yuan, Xin
PY - 2013
Y1 - 2013
N2 - The popularity of Amazon's EC2 cloud platform has increased in recent years. However, many high-performance computing (HPC) users consider dedicated high-performance clusters, typically found in large compute centers such as those in national laboratories, to be far superior to EC2 because of significant communication overhead of the latter. Our view is that this is quite narrow and the proper metrics for comparing high-performance clusters to EC2 is turnaround time and cost. In this paper, we compare the top-of-the-line EC2 cluster to HPC clusters at Lawrence Livermore National Laboratory (LLNL) based on turnaround time and total cost of execution. When measuring turnaround time, we include expected queue wait time on HPC clusters. Our results show that although as expected, standard HPC clusters are superior in raw performance, EC2 clusters may produce better turnaround times. To estimate cost, we developed a pricing model - relative to EC2's node-hour prices - to set node-hour prices for (currently free) LLNL clusters. We observe that the cost-effectiveness of running an application on a cluster depends on raw performance and application scalability.
AB - The popularity of Amazon's EC2 cloud platform has increased in recent years. However, many high-performance computing (HPC) users consider dedicated high-performance clusters, typically found in large compute centers such as those in national laboratories, to be far superior to EC2 because of significant communication overhead of the latter. Our view is that this is quite narrow and the proper metrics for comparing high-performance clusters to EC2 is turnaround time and cost. In this paper, we compare the top-of-the-line EC2 cluster to HPC clusters at Lawrence Livermore National Laboratory (LLNL) based on turnaround time and total cost of execution. When measuring turnaround time, we include expected queue wait time on HPC clusters. Our results show that although as expected, standard HPC clusters are superior in raw performance, EC2 clusters may produce better turnaround times. To estimate cost, we developed a pricing model - relative to EC2's node-hour prices - to set node-hour prices for (currently free) LLNL clusters. We observe that the cost-effectiveness of running an application on a cluster depends on raw performance and application scalability.
KW - cloud
KW - cost
KW - high-performance computing
KW - turnaround time
UR - http://www.scopus.com/inward/record.url?scp=84880064788&partnerID=8YFLogxK
U2 - 10.1145/2462902.2462919
DO - 10.1145/2462902.2462919
M3 - Conference contribution
AN - SCOPUS:84880064788
SN - 9781450319102
T3 - HPDC 2013 - Proceedings of the 22nd ACM International Symposium on High-Performance Parallel and Distributed Computing
SP - 239
EP - 250
BT - HPDC 2013 - Proceedings of the 22nd ACM International Symposium on High-Performance Parallel and Distributed Computing
PB - Association for Computing Machinery
T2 - 22nd ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2013
Y2 - 17 June 2013 through 21 June 2013
ER -