TY - GEN
T1 - End-to-End Spiking Neural Network for Speech Recognition Using Resonating Input Neurons
AU - Auge, Daniel
AU - Hille, Julian
AU - Kreutz, Felix
AU - Mueller, Etienne
AU - Knoll, Alois
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - The growing demand for complex computations in edge devices requires the development of algorithms and hardware accelerators that are powerful while remaining energy-efficient. A possible solution are spiking neural networks, as they have been demonstrated to be energy-efficient in several data processing and classification tasks when executed on specialized neuromorphic hardware. In the field of speech processing, they are especially suited for the online classification of audio streams due to their strong temporal affinity. However, so far, there has been a lack of emphasis on small-scale networks that will ultimately fit into restricted neuromorphic implementations. We propose the use of resonating neurons as an input layer to spiking neural networks for online audio classification to enable an end-to-end solution. We compare different architectures to the established method of using mel-frequency-based spectral features. With our approach, spiking neural networks can be directly used without additional preprocessing, thereby making them suitable for simple continuous low-power analysis of audio streams. We compare the classification accuracy of different network architectures with ours in a keyword spotting benchmark to demonstrate the performance of our approach.
AB - The growing demand for complex computations in edge devices requires the development of algorithms and hardware accelerators that are powerful while remaining energy-efficient. A possible solution are spiking neural networks, as they have been demonstrated to be energy-efficient in several data processing and classification tasks when executed on specialized neuromorphic hardware. In the field of speech processing, they are especially suited for the online classification of audio streams due to their strong temporal affinity. However, so far, there has been a lack of emphasis on small-scale networks that will ultimately fit into restricted neuromorphic implementations. We propose the use of resonating neurons as an input layer to spiking neural networks for online audio classification to enable an end-to-end solution. We compare different architectures to the established method of using mel-frequency-based spectral features. With our approach, spiking neural networks can be directly used without additional preprocessing, thereby making them suitable for simple continuous low-power analysis of audio streams. We compare the classification accuracy of different network architectures with ours in a keyword spotting benchmark to demonstrate the performance of our approach.
KW - Keyword detection
KW - Speech processing
KW - Spiking neural networks
UR - http://www.scopus.com/inward/record.url?scp=85115683403&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-86383-8_20
DO - 10.1007/978-3-030-86383-8_20
M3 - Conference contribution
AN - SCOPUS:85115683403
SN - 9783030863821
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 245
EP - 256
BT - Artificial Neural Networks and Machine Learning – ICANN 2021 - 30th International Conference on Artificial Neural Networks, Proceedings
A2 - Farkaš, Igor
A2 - Masulli, Paolo
A2 - Otte, Sebastian
A2 - Wermter, Stefan
PB - Springer Science and Business Media Deutschland GmbH
T2 - 30th International Conference on Artificial Neural Networks, ICANN 2021
Y2 - 14 September 2021 through 17 September 2021
ER -