Request and complaint recognition in call-center speech using a pointwise-convolution recurrent network

Zhipeng Yin, Xinzhou Xu, Björn Schuller

Research output: Contribution to journalArticlepeer-review

Abstract

The task of request and complaint recognition in call-center speech aims to identify both intention and emotion states for call speakers, through analysing the paralinguistic information conveyed by spoken signals. Nevertheless, existing related works fail to make full use of the fusion for the multi-layer representations derived from foundation models, and further, these works usually include insufficient sequential encoding for temporal information in speech. Specifically for recognising requests and complaints in call-center speech, we propose an approach using a PointWise-Convolution Recurrent network (PWCR) in this paper. Within the proposed approach, we first propose the pointwise-convolution module to perform layer-wise aggregation for the representations, from the multiple Transformer layers contained in a pre-trained foundation model. Then, a recurrent module is employed to capture effective temporal and contextual information, through a recurrent layer with multi-head self-attention. Subsequently, the experimental results on the HealthCall30 Corpus for request and complaint recognition in call-center speech indicate that, the proposed approach can achieve better recognition performance, compared with state-of-the-art approaches, resulting in unweighted average recalls of 78.7% (maximum) / 77.3% (average) and 60.6% (maximum) / 59.8% (average) for the request and complaint tasks, respectively.

Original languageEnglish
JournalInternational Journal of Speech Technology
DOIs
StateAccepted/In press - 2025

Keywords

  • Call-center speech
  • Complaint recognition
  • Foundation models
  • Pointwise-convolution recurrent network
  • Request recognition

Fingerprint

Dive into the research topics of 'Request and complaint recognition in call-center speech using a pointwise-convolution recurrent network'. Together they form a unique fingerprint.

Cite this