TY - GEN
T1 - Performance evaluation and optimization of random memory access on multicores with high productivity
AU - Saxena, Vaibhav
AU - Sabharwal, Yogish
AU - Bhatotia, Pramod
PY - 2010
Y1 - 2010
N2 - The slow progress in memory access latencies in comparison to CPU speeds has resulted in memory accesses dominating code performance. While architectural enhancements have benefited applications with data locality and sequential access, random memory access still remains a cause for concern. Several benchmarks have been proposed to evaluate the random memory access performance on multicore architectures. However, the performance evaluation models used by the existing benchmarks do not fully capture the varying types of random access behaviour arising in practical applications. In this paper, we propose a new model for evaluating the performance of random memory access that better captures the random access behaviour demonstrated by applications in practice. We use our model to evaluate the performance of two popular multicore architectures, the Cell and the GPU. We also suggest novel optimizations on these architectures that significantly boost the performance for random accesses in comparison to conventional architectures. Performance improvements on these architectures typically come at the cost of reduced productivity considering the extra programming effort involved. To address this problem, we propose libraries that incorporate these optimizations and provide innovatively designed programming interfaces that can be used by the applications to achieve good performance without loss of productivity.
AB - The slow progress in memory access latencies in comparison to CPU speeds has resulted in memory accesses dominating code performance. While architectural enhancements have benefited applications with data locality and sequential access, random memory access still remains a cause for concern. Several benchmarks have been proposed to evaluate the random memory access performance on multicore architectures. However, the performance evaluation models used by the existing benchmarks do not fully capture the varying types of random access behaviour arising in practical applications. In this paper, we propose a new model for evaluating the performance of random memory access that better captures the random access behaviour demonstrated by applications in practice. We use our model to evaluate the performance of two popular multicore architectures, the Cell and the GPU. We also suggest novel optimizations on these architectures that significantly boost the performance for random accesses in comparison to conventional architectures. Performance improvements on these architectures typically come at the cost of reduced productivity considering the extra programming effort involved. To address this problem, we propose libraries that incorporate these optimizations and provide innovatively designed programming interfaces that can be used by the applications to achieve good performance without loss of productivity.
UR - http://www.scopus.com/inward/record.url?scp=79952800435&partnerID=8YFLogxK
U2 - 10.1109/HIPC.2010.5713168
DO - 10.1109/HIPC.2010.5713168
M3 - Conference contribution
AN - SCOPUS:79952800435
SN - 9781424485185
T3 - 17th International Conference on High Performance Computing, HiPC 2010
BT - 17th International Conference on High Performance Computing, HiPC 2010
PB - IEEE Computer Society
T2 - 17th International Conference on High Performance Computing, HiPC 2010
Y2 - 19 December 2010 through 22 December 2010
ER -