A multi-modal graphical model for robust recognition of group actions in meetings from disturbed videos

Marc Al-Hames, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

In this work we present a novel multi-modal mixed-state dynamic Bayesian network (DBN) for robust meeting event classification from disturbed videos. The model uses information from the audio and the visual channel to structure meetings into segments. Within the DBN a multi-stream hidden Markov model (HMM) is coupled with a linear dynamical system (LDS) to compensate disturbances in the visual channel. Thereby the HMM is used as driving input for the LDS. Thus the model can handle noise and occlusions in the video. Experimental results on real meeting data show that the new model is highly preferable to all single-stream approaches. Compared to a baseline multi-modal early fusion HMM, the new DBN is 3.5%, respectively up to 6.1% better for clear and visual disturbed data, this corresponds to a relative error reduction of 23.6%, respectively 29.9%.

Original languageEnglish
Title of host publicationIEEE International Conference on Image Processing 2005, ICIP 2005
Pages421-424
Number of pages4
DOIs
StatePublished - 2005
EventIEEE International Conference on Image Processing 2005, ICIP 2005 - Genova, Italy
Duration: 11 Sep 200514 Sep 2005

Publication series

NameProceedings - International Conference on Image Processing, ICIP
Volume3
ISSN (Print)1522-4880

Conference

ConferenceIEEE International Conference on Image Processing 2005, ICIP 2005
Country/TerritoryItaly
CityGenova
Period11/09/0514/09/05

Fingerprint

Dive into the research topics of 'A multi-modal graphical model for robust recognition of group actions in meetings from disturbed videos'. Together they form a unique fingerprint.

Cite this