TY - GEN
T1 - In-NoC circuits for low-latency cache coherence in distributed shared-memory architectures
AU - Masing, Leonard
AU - Srivatsa, Akshay
AU - Kreb, Fabian
AU - Anantharajaiah, Nidhi
AU - Herkersdorf, Andreas
AU - Becker, Jurgen
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/16
Y1 - 2018/11/16
N2 - Scalable communication and low latency memory accesses are the deciding factors for future manycore performance. An efficient hardware infrastructure is required, since raw performance must be balanced with area and power constraints. In distributed shared-memory (DSM) architectures, caches help in reducing costly remote accesses but must be kept coherent. To enable scalable coherence in manycore systems, the recently proposed region-based cache coherence defines configurable regions, i.e. cache coherent sub-sections of a manycore architecture. In this paper, a technique for supporting the regionbased cache coherence mechanism by using so called in-NoC circuits (INCs) in a hybrid networks-on-chip is proposed. These circuits are automatically established based on traffic monitoring and traffic analysis to connect nodes (i.e. routers) in the network to enable a shortcut for packets, reducing their latency. The INCs can be used by packets stemming from different sources and targeting different destinations in contrast to traditional end-toend circuits. Depending on the coherence region, our evaluations of several benchmarks show a latency reduction of up to 45% on average in a 4x4 mesh that further increases with the mesh size. The FPGA synthesis of a router from a scientific DSM architecture that was extended with the presented features shows additional costs of up to 31% more LUTs and 20% more Flip Flops.
AB - Scalable communication and low latency memory accesses are the deciding factors for future manycore performance. An efficient hardware infrastructure is required, since raw performance must be balanced with area and power constraints. In distributed shared-memory (DSM) architectures, caches help in reducing costly remote accesses but must be kept coherent. To enable scalable coherence in manycore systems, the recently proposed region-based cache coherence defines configurable regions, i.e. cache coherent sub-sections of a manycore architecture. In this paper, a technique for supporting the regionbased cache coherence mechanism by using so called in-NoC circuits (INCs) in a hybrid networks-on-chip is proposed. These circuits are automatically established based on traffic monitoring and traffic analysis to connect nodes (i.e. routers) in the network to enable a shortcut for packets, reducing their latency. The INCs can be used by packets stemming from different sources and targeting different destinations in contrast to traditional end-toend circuits. Depending on the coherence region, our evaluations of several benchmarks show a latency reduction of up to 45% on average in a 4x4 mesh that further increases with the mesh size. The FPGA synthesis of a router from a scientific DSM architecture that was extended with the presented features shows additional costs of up to 31% more LUTs and 20% more Flip Flops.
KW - Cache coherence
KW - Distributed shared-memory
KW - Manycore
KW - Networks-on-chip
UR - http://www.scopus.com/inward/record.url?scp=85059765261&partnerID=8YFLogxK
U2 - 10.1109/MCSoC2018.2018.00033
DO - 10.1109/MCSoC2018.2018.00033
M3 - Conference contribution
AN - SCOPUS:85059765261
T3 - Proceedings - 2018 IEEE 12th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2018
SP - 138
EP - 145
BT - Proceedings - 2018 IEEE 12th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2018
Y2 - 12 September 2018 through 14 September 2018
ER -