Speech analysis in the big data era

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

In spoken language analysis tasks, one is often faced with comparably small available corpora of only one up to a few hours of speech material mostly annotated with a single phenomenon such as a particular speaker state at a time. In stark contrast to this, engines such as for the recognition of speakers’ emotions, sentiment, personality, or pathologies, are often expected to run independent of the speaker, the spoken content, and the acoustic conditions. This lack of large and richly annotated material likely explains to a large degree the headroom left for improvement in accuracy by todays engines. Yet, in the big data era, and with the increasing availability of crowd-sourcing services, and recent advances in weakly supervised learning, new opportunities arise to ease this fact. In this light, this contribution first shows the de-facto standard in terms of data-availability in a broad range of speaker analysis tasks. It then introduces highly efficient ‘cooperative’ learning strategies basing on the combination of active and semi-supervised alongside transfer learning to best exploit available data in combination with data synthesis. Further, approaches to estimate meaningful confidence measures in this domain are suggested, as they form (part of) the basis of the weakly supervised learning algorithms. In addition, first successful approaches towards holistic speech analysis are presented using deep recurrent rich multi-target learning with partially missing label information. Finally, steps towards needed distribution of processing for big data handling are demonstrated.

Original languageEnglish
Title of host publicationText, Speech, and Dialogue - 18th International Conference, TSD 2015, Proceedings
EditorsVáclav Matoušek, Pavel Král
PublisherSpringer Verlag
Pages3-11
Number of pages9
ISBN (Print)9783319240329
StatePublished - 2015
Externally publishedYes
Event18th International Conference on Text, Speech and Dialogue, TSD 2015 - Pilsen, Czech Republic
Duration: 14 Sep 201517 Sep 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9302
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Text, Speech and Dialogue, TSD 2015
Country/TerritoryCzech Republic
CityPilsen
Period14/09/1517/09/15

Keywords

  • Big data
  • Paralinguistics
  • Self-learning
  • Speech analysis

Fingerprint

Dive into the research topics of 'Speech analysis in the big data era'. Together they form a unique fingerprint.

Cite this