TY - GEN
T1 - Parallelisation of block-recursive matrix multiplication in prefix computations
AU - Bader, Michael
AU - Hanigk, Sebastian
AU - Huckle, Thomas
PY - 2008
Y1 - 2008
N2 - We present a performance and scalability study on the parallelisation of a series of matrix multiplications within a quantum control problem. As a time-critical sub-task, all matrix products A1.Ak for matrices A1,.,Am have to be computed (prefix problem). The parallelisation can either be coarse-grain, where parallel prefix computation is used to concurrently execute the individual matrix multiplications; however, there is also the obvious fine-grain parallelisation approach to parallelise the subsequent matrix multiplications, and mixtures of these two principal approaches are possible. We compare two parallel multiplication algorithms that are based on block-structuring-SRUMMA and a modified, block-recursive approach based on Peano space-filling curves-and study their efficiency on a compute cluster with 128 processors. The Peano approach proves to be advantageous especially for moderately sized matrices. Hence, we compare the Peano algorithm's performance for coarse-grain, fine-grain, and hybrid parallelisation of the prefix problem. The Peano algorithm proves to be scalable enough to allow a purely fine-grain parallelisation of the prefix problem.
AB - We present a performance and scalability study on the parallelisation of a series of matrix multiplications within a quantum control problem. As a time-critical sub-task, all matrix products A1.Ak for matrices A1,.,Am have to be computed (prefix problem). The parallelisation can either be coarse-grain, where parallel prefix computation is used to concurrently execute the individual matrix multiplications; however, there is also the obvious fine-grain parallelisation approach to parallelise the subsequent matrix multiplications, and mixtures of these two principal approaches are possible. We compare two parallel multiplication algorithms that are based on block-structuring-SRUMMA and a modified, block-recursive approach based on Peano space-filling curves-and study their efficiency on a compute cluster with 128 processors. The Peano approach proves to be advantageous especially for moderately sized matrices. Hence, we compare the Peano algorithm's performance for coarse-grain, fine-grain, and hybrid parallelisation of the prefix problem. The Peano algorithm proves to be scalable enough to allow a purely fine-grain parallelisation of the prefix problem.
UR - http://www.scopus.com/inward/record.url?scp=84906487924&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84906487924
SN - 9781586037963
T3 - Advances in Parallel Computing
SP - 175
EP - 184
BT - Parallel Computing
PB - IOS Press BV
ER -