ATTHEAR: EXPLAINING AUDIO TRANSFORMERS USING ATTENTION-AWARE NMF

Alican Akman, Björn W. Schuller

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

Abstract

The increasing success of transformer models in various fields, such as computer vision and audio processing, has led to a growing need for improved explainability to understand their complex decision-making processes better. Most existing techniques for explaining transformer models concentrate primarily on delivering visual and textual explanations, commonly used in visual media. However, audio explanations are crucial due to their intuitiveness on audio-based tasks and distinguishing expressiveness over other modalities. This work proposes a novel method to interpret audio-processing transformer models. Our method combines the available attention mechanism inside these models with non-negative matrix factorisation (NMF) to compute relevancy for audio inputs. While NMF decomposes audio into spectral patterns, attention weights are utilised to calculate time activation for these spectral patterns. The method then generates listenable audio explanations for the model's final decision using the most relevant audio portions. Our model effectively generates explanations by benchmarking against standard datasets, including keyword spotting and environmental sound classification.

OriginalspracheEnglisch
Titel2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
Seiten7015-7019
Seitenumfang5
ISBN (elektronisch)9798350344851
DOIs
PublikationsstatusVeröffentlicht - 2024
Veranstaltung49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Südkorea
Dauer: 14 Apr. 202419 Apr. 2024

Publikationsreihe

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Konferenz

Konferenz49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Land/GebietSüdkorea
OrtSeoul
Zeitraum14/04/2419/04/24

Fingerprint

Untersuchen Sie die Forschungsthemen von „ATTHEAR: EXPLAINING AUDIO TRANSFORMERS USING ATTENTION-AWARE NMF“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren