Boosting multi-modal camera selection with semantic features

Benedikt Hörnler, Dejan Arsić, Bjön Schuller, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

In this work semantic features are used to improve the results of the camera selection. These semantic features are group action, person action and person speaking. For this purpose low level acoustic and visual features are combined with high level semantic ones. After the feature fusion, a segmentation and classification are performed by Hidden Markov Models. The evaluation shows that an absolute improvement of 6.5% can be achieved. The frame error rate is reduced to 38.1% by using acoustic and all semantic features. The best model using only low level features achieves a frame error rate of 44.6%, which is the best one reported on this data set.

Original languageEnglish
Title of host publicationProceedings - 2009 IEEE International Conference on Multimedia and Expo, ICME 2009
Pages1298-1301
Number of pages4
DOIs
StatePublished - 2009
Event2009 IEEE International Conference on Multimedia and Expo, ICME 2009 - New York, NY, United States
Duration: 28 Jun 20093 Jul 2009

Publication series

NameProceedings - 2009 IEEE International Conference on Multimedia and Expo, ICME 2009

Conference

Conference2009 IEEE International Conference on Multimedia and Expo, ICME 2009
Country/TerritoryUnited States
CityNew York, NY
Period28/06/093/07/09

Keywords

  • Human-machine interaction
  • Machine learning
  • Meeting analysis
  • Multi cameras
  • Multi-modal low level features

Fingerprint

Dive into the research topics of 'Boosting multi-modal camera selection with semantic features'. Together they form a unique fingerprint.

Cite this