TY - GEN
T1 - Resource-aware optimization of DNNs for embedded applications
AU - Frickenstein, Alexander
AU - Unger, Christian
AU - Stechele, Walter
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - Despite their outstanding success in solving complex computer vision problems, Deep Neural Networks (DNNs) still require high-performance hardware for real-time inference. Therefore they are not applicable to low-cost embedded hardware, where memory resources, computational performance and power consumption are restricted. Furthermore, current approaches of fitting neural networks to embedded hardware are time consuming, inducing slow development cycles. To address these drawbacks and satisfy the demands of embedded hardware, this paper proposes a computationally efficient magnitude-based pruning scheme, based on a half-interval search, combined with effective weight sharing, fixed-point quantization, and lossless compression. The proposed solution can be utilized to generate an optimized model, either with respect to memory demand or execution time. For instance, the memory demand of LeNet is compressed about 385×. VGG16 is pruned by about 14.5×, whilst its computational costs are reduced by about 1.6× for a CPU-based application and 4.8× for an FPGA one.
AB - Despite their outstanding success in solving complex computer vision problems, Deep Neural Networks (DNNs) still require high-performance hardware for real-time inference. Therefore they are not applicable to low-cost embedded hardware, where memory resources, computational performance and power consumption are restricted. Furthermore, current approaches of fitting neural networks to embedded hardware are time consuming, inducing slow development cycles. To address these drawbacks and satisfy the demands of embedded hardware, this paper proposes a computationally efficient magnitude-based pruning scheme, based on a half-interval search, combined with effective weight sharing, fixed-point quantization, and lossless compression. The proposed solution can be utilized to generate an optimized model, either with respect to memory demand or execution time. For instance, the memory demand of LeNet is compressed about 385×. VGG16 is pruned by about 14.5×, whilst its computational costs are reduced by about 1.6× for a CPU-based application and 4.8× for an FPGA one.
KW - CNN
KW - Coding
KW - Embedded HW
KW - Pruning
KW - Quantization
KW - Weight-Sharing
UR - http://www.scopus.com/inward/record.url?scp=85070981183&partnerID=8YFLogxK
U2 - 10.1109/CRV.2019.00011
DO - 10.1109/CRV.2019.00011
M3 - Conference contribution
AN - SCOPUS:85070981183
T3 - Proceedings - 2019 16th Conference on Computer and Robot Vision, CRV 2019
SP - 17
EP - 24
BT - Proceedings - 2019 16th Conference on Computer and Robot Vision, CRV 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th Conference on Computer and Robot Vision, CRV 2019
Y2 - 29 May 2019 through 31 May 2019
ER -