TY - GEN

T1 - Parallelisation of block-recursive matrix multiplication in prefix computations

AU - Bader, Michael

AU - Hanigk, Sebastian

AU - Huckle, Thomas

PY - 2008

Y1 - 2008

N2 - We present a performance and scalability study on the parallelisation of a series of matrix multiplications within a quantum control problem. As a time-critical sub-task, all matrix products A1.Ak for matrices A1,.,Am have to be computed (prefix problem). The parallelisation can either be coarse-grain, where parallel prefix computation is used to concurrently execute the individual matrix multiplications; however, there is also the obvious fine-grain parallelisation approach to parallelise the subsequent matrix multiplications, and mixtures of these two principal approaches are possible. We compare two parallel multiplication algorithms that are based on block-structuring-SRUMMA and a modified, block-recursive approach based on Peano space-filling curves-and study their efficiency on a compute cluster with 128 processors. The Peano approach proves to be advantageous especially for moderately sized matrices. Hence, we compare the Peano algorithm's performance for coarse-grain, fine-grain, and hybrid parallelisation of the prefix problem. The Peano algorithm proves to be scalable enough to allow a purely fine-grain parallelisation of the prefix problem.

AB - We present a performance and scalability study on the parallelisation of a series of matrix multiplications within a quantum control problem. As a time-critical sub-task, all matrix products A1.Ak for matrices A1,.,Am have to be computed (prefix problem). The parallelisation can either be coarse-grain, where parallel prefix computation is used to concurrently execute the individual matrix multiplications; however, there is also the obvious fine-grain parallelisation approach to parallelise the subsequent matrix multiplications, and mixtures of these two principal approaches are possible. We compare two parallel multiplication algorithms that are based on block-structuring-SRUMMA and a modified, block-recursive approach based on Peano space-filling curves-and study their efficiency on a compute cluster with 128 processors. The Peano approach proves to be advantageous especially for moderately sized matrices. Hence, we compare the Peano algorithm's performance for coarse-grain, fine-grain, and hybrid parallelisation of the prefix problem. The Peano algorithm proves to be scalable enough to allow a purely fine-grain parallelisation of the prefix problem.

UR - http://www.scopus.com/inward/record.url?scp=84906487924&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84906487924

SN - 9781586037963

T3 - Advances in Parallel Computing

SP - 175

EP - 184

BT - Parallel Computing

PB - IOS Press BV

ER -