Parallel matrix multiplication based on space-filling curves on shared memory multicore platforms

Alexander Heinecke, Michael Bader

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

We present a parallel implementation of a cache oblivious algorithm for matrix multiplication on multicore platforms. The algorithm is based on a storage scheme and a block-recursive approach for multiplication, which are both based on a Peano space-filling curve. The recursion is stopped on matrix blocks with a size that needs to perfectly match the size of the Ll cache of the underlying CPU. The respective block multiplications are implemented by multiplication kernels that are hand-optimised for the SIMD units of current x86 CPUs. The Peano storage scheme is used to partition the block multiplications to different cores. Performance tests on various multicore platforms with up to 16 cores and different memory architecture show that the resulting implementation leads to better parallel scalability than achieved by Intel's MKL or GotoBLAS, and can outperform both libraries in terms of absolute performance on eight or more cores.

Original languageEnglish
Title of host publicationConference on Computing Frontiers - Proceedings of the 2008 Workshop on Memory Access on Future Processors
Subtitle of host publicationA Solved Problem MAW'08
Pages385-392
Number of pages8
DOIs
StatePublished - 2008
Event2008 Workshop on Memory Access on Future Processors: A Solved Problem MAW'08 - Ischia, Italy
Duration: 5 May 20087 May 2008

Publication series

NameConference on Computing Frontiers - Proceedings of the 2008 Workshop on Memory Access on Future Processors: A Solved Problem MAW'08

Conference

Conference2008 Workshop on Memory Access on Future Processors: A Solved Problem MAW'08
Country/TerritoryItaly
CityIschia
Period5/05/087/05/08

Keywords

  • Cache oblivious algorithms
  • Matrix multiplication
  • Multicore
  • Parallelisation
  • Space-filling curves

Fingerprint

Dive into the research topics of 'Parallel matrix multiplication based on space-filling curves on shared memory multicore platforms'. Together they form a unique fingerprint.

Cite this