TY - JOUR
T1 - Unlocking Efficiency in BNNs
T2 - Global by Local Thresholding for Analog-Based HW Accelerators
AU - Yayla, Mikail
AU - Frustaci, Fabio
AU - Spagnolo, Fanny
AU - Chen, Jian Jia
AU - Amrouch, Hussam
N1 - Publisher Copyright:
© 2011 IEEE.
PY - 2023/12/1
Y1 - 2023/12/1
N2 - For accelerating Binarized Neural Networks (BNNs), analog computing-based crossbar accelerators, utilizing XNOR gates and additional interface circuits, have been proposed. Such accelerators demand a large amount of analog-to-digital converters (ADCs) and registers, resulting in expensive designs. To increase the inference efficiency, the state of the art divides the interface circuit into an Analog Path (AP), utilizing (cheap) analog comparators, and a Digital Path (DP), utilizing (expensive) ADCs and registers. During BNN execution, a certain path is selectively triggered. Ideally, as inference via AP is more efficient, it should be triggered as often as possible. However, we reveal that, unless the number of weights is very small, the AP is rarely triggered. To overcome this, we propose a novel BNN inference scheme, called Local Thresholding Approximation (LTA). It approximates the global thresholdings in BNNs by local thresholdings. This enables the use of the AP through most of the execution, which significantly increases the interface circuit efficiency. In our evaluations with two BNN architectures, using LTA reduces the area by 42x and 54x, the energy by 2.7x and 4.2x, and the latency by 3.8x and 1.15x, compared to the state-of-the-art crossbar-based BNN accelerators.
AB - For accelerating Binarized Neural Networks (BNNs), analog computing-based crossbar accelerators, utilizing XNOR gates and additional interface circuits, have been proposed. Such accelerators demand a large amount of analog-to-digital converters (ADCs) and registers, resulting in expensive designs. To increase the inference efficiency, the state of the art divides the interface circuit into an Analog Path (AP), utilizing (cheap) analog comparators, and a Digital Path (DP), utilizing (expensive) ADCs and registers. During BNN execution, a certain path is selectively triggered. Ideally, as inference via AP is more efficient, it should be triggered as often as possible. However, we reveal that, unless the number of weights is very small, the AP is rarely triggered. To overcome this, we propose a novel BNN inference scheme, called Local Thresholding Approximation (LTA). It approximates the global thresholdings in BNNs by local thresholdings. This enables the use of the AP through most of the execution, which significantly increases the interface circuit efficiency. In our evaluations with two BNN architectures, using LTA reduces the area by 42x and 54x, the energy by 2.7x and 4.2x, and the latency by 3.8x and 1.15x, compared to the state-of-the-art crossbar-based BNN accelerators.
KW - Neural networks
KW - approximate computing
KW - error tolerance
KW - hardware design
UR - http://www.scopus.com/inward/record.url?scp=85171746554&partnerID=8YFLogxK
U2 - 10.1109/JETCAS.2023.3315561
DO - 10.1109/JETCAS.2023.3315561
M3 - Article
AN - SCOPUS:85171746554
SN - 2156-3357
VL - 13
SP - 940
EP - 955
JO - IEEE Journal on Emerging and Selected Topics in Circuits and Systems
JF - IEEE Journal on Emerging and Selected Topics in Circuits and Systems
IS - 4
ER -