Abstract
In this paper we highlight a number of factors that determine the speed of iterative time-discrete algorithms, exemplified by various TLM codes. One of the main determining factors of computational code performance on state-the-art platforms is memory access and not the number of floating point operations per TLM node, as recently shown. To show the influence of memory access, the performance of various TLM codes is compared on various platforms for the example of a microstrip via-hole connect and a simple cavity resonator. Conventionally optimized and memory-access optimized codes are compared. Further, strategies for optimizing memory access are discussed. In addition, the issue of software emulated floating point underflow handling in simulations is discussed. By exploiting memory access optimization strategies, one can achieve a speed-up of the code of up to 100%. However, the optimization strategies are somehow dependent upon the compiler. If memory access is suitably optimized, TLM codes show similar performance on Pentium based PCs and workstations independently of the employed operating system and compiler. The reason why the same code still runs faster on workstations than on PCs with similar clock speed is due to the higher memory bus clock frequency and mare aggressive out-of-order execution and branch-line-prediction of the workstations.
Original language | English |
---|---|
Pages | 594-601 |
Number of pages | 8 |
State | Published - 2000 |
Event | 16th Annual Review of Progress in Applied Computational Electromagnetics (ACES 2000) - Monterey, CA, USA Duration: 20 Mar 2000 → 24 Mar 2000 |
Conference
Conference | 16th Annual Review of Progress in Applied Computational Electromagnetics (ACES 2000) |
---|---|
City | Monterey, CA, USA |
Period | 20/03/00 → 24/03/00 |