Lecture Video Highlights Detection from Speech

  • Meishu Song
  • , Ilhan Aslan
  • , Emilia Parada-Cabaleiro
  • , Zijiang Yang
  • , Elisabeth André
  • , Yoshiharu Yamamoto
  • , Björn W. Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In interpersonal co-located and online teaching, lecturers highlight words and sentences in their speech in order to implicitly communicate that particular content is important. This social behaviour aimed to capture students’ attention becomes crucial in distance learning, where the teacher’s voice is an essential instrument to maximise students’ attention. To enable intelligent systems, such as smart tutors, to understand and replicate this social behaviour, the ability to automatically recognise speech-based highlighting is needed. To this end, we introduce a public corpus for automatic detection of speech-based highlighting in learning context. With “Highlighting” we refer to the emphasised content, i.e., the important content which lecturers try to emphasise (highlight) by attracting the listeners attention. The dataset is derived from YouTube tutorial videos featuring 104 different English speakers who cover different disciplines. In sum, the dataset, which will be made freely available to the community. In addition, to establish an analysis for the corpus, we report on a series of experiments with the best results being achieved with a combination of a VGG net and transformer architectures. Our initial results of 78.2 % Accuracy and 78.8 % Unweighted Average Recall (UAR), encourage us to believe that this new dataset will facilitate progress in speech processing research for education.

Original languageEnglish
Title of host publication32nd European Signal Processing Conference, EUSIPCO 2024 - Proceedings
PublisherEuropean Signal Processing Conference, EUSIPCO
Pages361-365
Number of pages5
ISBN (Electronic)9789464593617
DOIs
StatePublished - 2024
Externally publishedYes
Event32nd European Signal Processing Conference, EUSIPCO 2024 - Lyon, France
Duration: 26 Aug 202430 Aug 2024

Publication series

NameEuropean Signal Processing Conference
ISSN (Print)2219-5491

Conference

Conference32nd European Signal Processing Conference, EUSIPCO 2024
Country/TerritoryFrance
CityLyon
Period26/08/2430/08/24

Keywords

  • Highlighting Content
  • Speech
  • Transformer
  • VGG

Fingerprint

Dive into the research topics of 'Lecture Video Highlights Detection from Speech'. Together they form a unique fingerprint.

Cite this