Porting existing cache-oblivious linear algebra HPC modules to larrabee architecture

Alexander Heinecke, Carsten Trinitis, Josef Weidendorfer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Cache-obliviousness represents an important but relatively new concept for cache optimization. As cache-oblivious algorithms perform well on architectures with arbitrary cache configurations, the programming effort required for porting and optimizing for future architectures can be significantly reduced. In [8] and [9], fast parallel cache-oblivious linear algebra modules have been presented. The underlying matrix storing schemes are based on space filling curves. For matrix multiplication, all cache misses can be avoided, whereas for the LU decomposition algorithm the number of cache misses is minimized. It has been shown that the resulting codes work very well on several kinds of systems ranging from laptops to supercomputers. In this paper, we will show that the runtime characteristics of our existing cache-oblivious codes can be preserved on newer Intel processors. Special emphasis is put on the first many-core processor architecture with complete hardware-based cache coherency: The Larrabee Architecture. As the latter is expected to be available as a PCIe card connected to the host system, porting had to take into account transfer of data structures between different memory address spaces. Unfortunately, Larrabee was canceled as a graphics device for 2010, but Intel is expected to outline futher steps about Larrabee during 2010.

Original languageEnglish
Title of host publicationCF 2010 - Proceedings of the 2010 Computing Frontiers Conference
Pages91-92
Number of pages2
DOIs
StatePublished - 2010
Event7th ACM International Conference on Computing Frontiers, CF'10 - Bertinoro, Italy
Duration: 17 May 201019 May 2010

Publication series

NameCF 2010 - Proceedings of the 2010 Computing Frontiers Conference

Conference

Conference7th ACM International Conference on Computing Frontiers, CF'10
Country/TerritoryItaly
CityBertinoro
Period17/05/1019/05/10

Keywords

  • accelerator space-filling curve
  • cache-oblivious
  • lu decomposition
  • manycore
  • matrix multiplication
  • openmp

Fingerprint

Dive into the research topics of 'Porting existing cache-oblivious linear algebra HPC modules to larrabee architecture'. Together they form a unique fingerprint.

Cite this