TY - JOUR
T1 - Comprehensive analysis of high-performance computing methods for filtered back-projection
AU - Mendl, Christian B.
AU - Eliuk, Steven
AU - Noga, Michelle
AU - Boulanger, Pierre
PY - 2012
Y1 - 2012
N2 - This paper provides an extensive runtime, accuracy, and noise analysis of Computed Tomography (CT) reconstruction algorithms using various High-Performance Computing (HPC) frameworks such as: "conventional" multi-core, multi threaded CPUs, Compute Unified Device Architecture (CUDA), and DirectX or OpenGL graphics pipeline programming. The proposed algorithms exploit various built-in hardwired features of GPUs such as rasterization and texture filtering. We compare implementations of the Filtered Back-Projection (FBP) algorithm with fan-beam geometry for all frameworks. The accuracy of the reconstruction is validated using an ACR-accredited phantom, with the raw attenuation data acquired by a clinical CT scanner. Our analysis shows that a single GPU can run a FBP reconstruction 23 time faster than a 64-core multi-threaded CPU machine for an image of 1024 × 1024. Moreover, directly programming the graphics pipeline using DirectX or OpenGL can further increases the performance compared to a CUDA implementation.
AB - This paper provides an extensive runtime, accuracy, and noise analysis of Computed Tomography (CT) reconstruction algorithms using various High-Performance Computing (HPC) frameworks such as: "conventional" multi-core, multi threaded CPUs, Compute Unified Device Architecture (CUDA), and DirectX or OpenGL graphics pipeline programming. The proposed algorithms exploit various built-in hardwired features of GPUs such as rasterization and texture filtering. We compare implementations of the Filtered Back-Projection (FBP) algorithm with fan-beam geometry for all frameworks. The accuracy of the reconstruction is validated using an ACR-accredited phantom, with the raw attenuation data acquired by a clinical CT scanner. Our analysis shows that a single GPU can run a FBP reconstruction 23 time faster than a 64-core multi-threaded CPU machine for an image of 1024 × 1024. Moreover, directly programming the graphics pipeline using DirectX or OpenGL can further increases the performance compared to a CUDA implementation.
KW - Image reconstruction - analytical methods
KW - Parallel computing
KW - X-ray imaging and computed tomography
UR - http://www.scopus.com/inward/record.url?scp=84877356948&partnerID=8YFLogxK
U2 - 10.5565/rev/elcvia.508
DO - 10.5565/rev/elcvia.508
M3 - Article
AN - SCOPUS:84877356948
SN - 1577-5097
VL - 12
SP - 1
EP - 16
JO - Electronic Letters on Computer Vision and Image Analysis
JF - Electronic Letters on Computer Vision and Image Analysis
IS - 1
ER -