TY - GEN
T1 - Audio-visual processing in meetings
T2 - 3rd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2006
AU - Al-Hames, Marc
AU - Hain, Thomas
AU - Cernocky, Jan
AU - Schreiber, Sascha
AU - Poel, Mannes
AU - Müller, Ronald
AU - Marcel, Sebastien
AU - Van Leeuwen, David
AU - Odobez, Jean Marc
AU - Ba, Sileye
AU - Bourlard, Herve
AU - Cardinaux, Fabien
AU - Gatica-Perez, Daniel
AU - Janin, Adam
AU - Motlicek, Petr
AU - Reiter, Stephan
AU - Renals, Steve
AU - Van Rest, Jeroen
AU - Rienks, Rutger
AU - Rigoll, Gerhard
AU - Smith, Kevin
AU - Thean, Andrew
AU - Zemcik, Pavel
PY - 2006
Y1 - 2006
N2 - The project Augmented Multi-party Interaction (AMI) is concerned with the development of meeting browsers and remote meeting assistants for instrumented meeting rooms - and the required component technologies R&D themes: group dynamics, audio, visual, and multimodal processing, content abstraction, and human-computer interaction. The audio-visual processing workpackage within AMI addresses the automatic recognition from audio, video, and combined audio-video streams, that have been recorded during meetings. In this article we describe the progress that has been made in the first two years of the project. We show how the large problem of audio-visual processing in meetings can be split into seven questions, like "Who is acting during the meeting?". We then show which algorithms and methods have been developed and evaluated for the automatic answering of these questions.
AB - The project Augmented Multi-party Interaction (AMI) is concerned with the development of meeting browsers and remote meeting assistants for instrumented meeting rooms - and the required component technologies R&D themes: group dynamics, audio, visual, and multimodal processing, content abstraction, and human-computer interaction. The audio-visual processing workpackage within AMI addresses the automatic recognition from audio, video, and combined audio-video streams, that have been recorded during meetings. In this article we describe the progress that has been made in the first two years of the project. We show how the large problem of audio-visual processing in meetings can be split into seven questions, like "Who is acting during the meeting?". We then show which algorithms and methods have been developed and evaluated for the automatic answering of these questions.
UR - http://www.scopus.com/inward/record.url?scp=77249087752&partnerID=8YFLogxK
U2 - 10.1007/11965152_3
DO - 10.1007/11965152_3
M3 - Conference contribution
AN - SCOPUS:77249087752
SN - 3540692673
SN - 9783540692676
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 24
EP - 35
BT - Machine Learning for Multimodal Interaction - Third International Workshop, MLMI 2006, Revised Selected Papers
Y2 - 1 May 2006 through 4 May 2006
ER -