TY - GEN
T1 - PSMalloc
T2 - 10th MEDEA Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture, MEDEA '09, held in conjunction with the Int. Conf. on Parallel Architectures and Compilation Techniques, PACT 2009
AU - Biswas, Susmit
AU - Franklin, Diana
AU - Sherwood, Timothy
AU - Chong, Frederic T.
AU - De Supinski, Bronis R.
AU - Schulz, Martin
PY - 2009
Y1 - 2009
N2 - Multicore processors have come to dominate the commodity market upon which many large scale systems are based. The number of cores is increasing with the speed of Moore's law and as a direct consequence, the memory available per core is decreasing, often severely limiting the problem size for programs running on such platforms. Thus, mechanisms to store memory efficiently in DRAM, increasing the effective capacity of DRAM, in a way that requires no reprogramming, would dramatically increase the benefits of multicore nodes for large scale systems. We observe that MPI programs replicate a significant amount of data across all processes. With multiple MPI tasks running on a single node, this replication leads to identical data residing in multiple locations in that node's DRAM, an ideal candidate for potential savings. We have found that most of the redundant data resides in the heap. Thus, smart memory allocation can remove this redundancy and increase the effective memory capacity. We present PSMalloc, a memory allocation library that keeps a single copy of identical pages from a set of MPI tasks. PSMalloc is implemented as a user level library that can be linked at runtime, avoiding changes in the application or the operating system. To the best of our knowledge, our work is the first that reduces physical memory footprints of MPI tasks in a multicore node without requiring kernel level modifications. We experiment with four MPI applications from the ASC Sequoia benchmark suite and show that we can achieve a reduction in memory footprint up to 22% and 11.18% in average.
AB - Multicore processors have come to dominate the commodity market upon which many large scale systems are based. The number of cores is increasing with the speed of Moore's law and as a direct consequence, the memory available per core is decreasing, often severely limiting the problem size for programs running on such platforms. Thus, mechanisms to store memory efficiently in DRAM, increasing the effective capacity of DRAM, in a way that requires no reprogramming, would dramatically increase the benefits of multicore nodes for large scale systems. We observe that MPI programs replicate a significant amount of data across all processes. With multiple MPI tasks running on a single node, this replication leads to identical data residing in multiple locations in that node's DRAM, an ideal candidate for potential savings. We have found that most of the redundant data resides in the heap. Thus, smart memory allocation can remove this redundancy and increase the effective memory capacity. We present PSMalloc, a memory allocation library that keeps a single copy of identical pages from a set of MPI tasks. PSMalloc is implemented as a user level library that can be linked at runtime, avoiding changes in the application or the operating system. To the best of our knowledge, our work is the first that reduces physical memory footprints of MPI tasks in a multicore node without requiring kernel level modifications. We experiment with four MPI applications from the ASC Sequoia benchmark suite and show that we can achieve a reduction in memory footprint up to 22% and 11.18% in average.
UR - https://www.scopus.com/pages/publications/74549193534
U2 - 10.1145/1621960.1621968
DO - 10.1145/1621960.1621968
M3 - Conference contribution
AN - SCOPUS:74549193534
SN - 9781605588308
T3 - ACM International Conference Proceeding Series
SP - 43
EP - 48
BT - Proceedings of the 10th MEDEA Workshop on MEmory Performance
Y2 - 13 September 2009 through 13 September 2009
ER -