TY - GEN
T1 - A hybrid music retrieval system using belief networks to integrate multimodal queries and contextual knowledge
AU - Schuller, Björn
AU - Zobl, Martin
AU - Rigoll, Gerhard
AU - Lang, Manfred
N1 - Publisher Copyright:
© 2003 IEEE.
PY - 2003
Y1 - 2003
N2 - Recently an increasing interest in music retrieval can be observed. Due to the growing amount of online and offline available music and a broadening user spectrum more efficient query methods are needed. We believe that only a parallel multimodal combination of different input modalities forms the most intuitive way to access desired media for any user. In this paper we introduce a query by humming, speaking, writing, and typing. The strengths of each modality are combined in a synergetic manner by a soft decision fusion. Songs can be referenced by their according melody, artist, title or other specific information. Further more the recognition of the actual user's emotion and external contextual knowledge helps to build an expectance of the intended song at a time. This constrains the hypothesis sphere of possible songs and leads to a more robust recognition or even a suggestive query. A combination of artificial neural networks, hidden Markov models and dynamic time warping integrated in a Bayesian belief network framework build the mathematical background of the chosen hybrid architecture. We address the implementation of a working system and results achieved by the introduced methods.
AB - Recently an increasing interest in music retrieval can be observed. Due to the growing amount of online and offline available music and a broadening user spectrum more efficient query methods are needed. We believe that only a parallel multimodal combination of different input modalities forms the most intuitive way to access desired media for any user. In this paper we introduce a query by humming, speaking, writing, and typing. The strengths of each modality are combined in a synergetic manner by a soft decision fusion. Songs can be referenced by their according melody, artist, title or other specific information. Further more the recognition of the actual user's emotion and external contextual knowledge helps to build an expectance of the intended song at a time. This constrains the hypothesis sphere of possible songs and leads to a more robust recognition or even a suggestive query. A combination of artificial neural networks, hidden Markov models and dynamic time warping integrated in a Bayesian belief network framework build the mathematical background of the chosen hybrid architecture. We address the implementation of a working system and results achieved by the introduced methods.
UR - http://www.scopus.com/inward/record.url?scp=11244339115&partnerID=8YFLogxK
U2 - 10.1109/ICME.2003.1220853
DO - 10.1109/ICME.2003.1220853
M3 - Conference contribution
AN - SCOPUS:11244339115
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
SP - 57
EP - 60
BT - Proceedings - 2003 International Conference on Multimedia and Expo, ICME
PB - IEEE Computer Society
T2 - 2003 International Conference on Multimedia and Expo, ICME 2003
Y2 - 6 July 2003 through 9 July 2003
ER -