Using audio, visual, and lexical features in a multi-modal virtual meeting director

Marc Al-Hames, Benedikt Hörnler, Christoph Scheuermann, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Multi-modal recordings of meetings provide the basis for meeting browsing and for remote meetings. However it is often not useful to store or transmit all visual channels. In this work we show how a virtual meeting director selects one of seven possible video modes. We then present several audio, visual, and lexical features for a virtual director. In an experimental section we evaluate the features, their influence on the camera selection, and the properties of the generated video stream. The chosen features all allow a real- or near real-time processing and can therefore not only be applied to offline browsing, but also for a remote meeting assistant.

Original languageEnglish
Title of host publicationMachine Learning for Multimodal Interaction - Third International Workshop, MLMI 2006, Revised Selected Papers
Pages63-74
Number of pages12
DOIs
StatePublished - 2006
Event3rd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2006 - Bethesda, MD, United States
Duration: 1 May 20064 May 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4299 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2006
Country/TerritoryUnited States
CityBethesda, MD
Period1/05/064/05/06

Fingerprint

Dive into the research topics of 'Using audio, visual, and lexical features in a multi-modal virtual meeting director'. Together they form a unique fingerprint.

Cite this