ATTHEAR: EXPLAINING AUDIO TRANSFORMERS USING ATTENTION-AWARE NMF

Alican Akman, Björn W. Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The increasing success of transformer models in various fields, such as computer vision and audio processing, has led to a growing need for improved explainability to understand their complex decision-making processes better. Most existing techniques for explaining transformer models concentrate primarily on delivering visual and textual explanations, commonly used in visual media. However, audio explanations are crucial due to their intuitiveness on audio-based tasks and distinguishing expressiveness over other modalities. This work proposes a novel method to interpret audio-processing transformer models. Our method combines the available attention mechanism inside these models with non-negative matrix factorisation (NMF) to compute relevancy for audio inputs. While NMF decomposes audio into spectral patterns, attention weights are utilised to calculate time activation for these spectral patterns. The method then generates listenable audio explanations for the model's final decision using the most relevant audio portions. Our model effectively generates explanations by benchmarking against standard datasets, including keyword spotting and environmental sound classification.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages7015-7019
Number of pages5
ISBN (Electronic)9798350344851
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Country/TerritoryKorea, Republic of
CitySeoul
Period14/04/2419/04/24

Keywords

  • Audio Explainability
  • Audio Transformers
  • Computer Audition
  • Explainable Artificial Intelligence

Fingerprint

Dive into the research topics of 'ATTHEAR: EXPLAINING AUDIO TRANSFORMERS USING ATTENTION-AWARE NMF'. Together they form a unique fingerprint.

Cite this