TY - GEN
T1 - Exploiting data similarity to reduce memory footprints
AU - Biswas, Susmit
AU - De Supinski, Bronis R.
AU - Schulz, Martin
AU - Franklin, Diana
AU - Sherwood, Timothy
AU - Chong, Frederic T.
PY - 2011
Y1 - 2011
N2 - Memory size has long limited large-scale applications on high-performance computing (HPC) systems. Since compute nodes frequently do not have swap space, physical memory often limits problem sizes. Increasing core counts per chip and power density constraints, which limit the number of DIMMs per node, have exacerbated this problem. Further, DRAM constitutes a significant portion of overall HPC system cost. Therefore, instead of adding more DRAM to the nodes, mechanisms to manage memory usage more efficiently - preferably transparently - could increase effective DRAM capacity and thus the benefit of multicore nodes for HPC systems. MPI application processes often exhibit significant data similarity. These data regions occupy multiple physical locations across the individual rank processes within a multicore node and thus offer a potential savings in memory capacity. These regions, primarily residing in heap, are dynamic, which makes them difficult to manage statically. Our novel memory allocation library, SBLLmallocShort, automatically identifies identical memory blocks and merges them into a single copy. Our implementation is transparent to the application and does not require any kernel modifications. Overall, we demonstrate that SBLLmalloc reduces the memory footprint of a range of MPI applications by 32.03% on average and up to 60.87%. Further, SBLLmalloc supports problem sizes for IRS over 21.36% larger than using standard memory management techniques, thus significantly increasing effective system size. Similarly, SBLLmalloc requires 43.75% fewer nodes than standard memory management techniques to solve an AMG problem.
AB - Memory size has long limited large-scale applications on high-performance computing (HPC) systems. Since compute nodes frequently do not have swap space, physical memory often limits problem sizes. Increasing core counts per chip and power density constraints, which limit the number of DIMMs per node, have exacerbated this problem. Further, DRAM constitutes a significant portion of overall HPC system cost. Therefore, instead of adding more DRAM to the nodes, mechanisms to manage memory usage more efficiently - preferably transparently - could increase effective DRAM capacity and thus the benefit of multicore nodes for HPC systems. MPI application processes often exhibit significant data similarity. These data regions occupy multiple physical locations across the individual rank processes within a multicore node and thus offer a potential savings in memory capacity. These regions, primarily residing in heap, are dynamic, which makes them difficult to manage statically. Our novel memory allocation library, SBLLmallocShort, automatically identifies identical memory blocks and merges them into a single copy. Our implementation is transparent to the application and does not require any kernel modifications. Overall, we demonstrate that SBLLmalloc reduces the memory footprint of a range of MPI applications by 32.03% on average and up to 60.87%. Further, SBLLmalloc supports problem sizes for IRS over 21.36% larger than using standard memory management techniques, thus significantly increasing effective system size. Similarly, SBLLmalloc requires 43.75% fewer nodes than standard memory management techniques to solve an AMG problem.
UR - http://www.scopus.com/inward/record.url?scp=80053259207&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2011.24
DO - 10.1109/IPDPS.2011.24
M3 - Conference contribution
AN - SCOPUS:80053259207
SN - 9780769543857
T3 - Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011
SP - 152
EP - 163
BT - Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011
T2 - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011
Y2 - 16 May 2011 through 20 May 2011
ER -