TY - GEN
T1 - HMM-based music retrieval using stereophonic feature information and framelength adaptation
AU - Schuller, B.
AU - Rigoll, G.
AU - Lang, M.
N1 - Publisher Copyright:
© 2003 IEEE.
PY - 2003
Y1 - 2003
N2 - Music retrieval methods are in the focus of recent interest due to the increasing size of music databases as e.g. in the Internet. Among different query methods content-based media retrieval analyzing intrinsic characteristics of the source seems to form the most intuitive access. The key-melody in a song can be regarded as the major characteristic in music and leads to a query by humming or singing. In this paper we turn our attention to both, the features and the algorithm of matching in audio music retrieval. Nowadays approaches propagate the use of dynamic time warping for the matching process. As reference mostly midi-data or humming itself is used. However, first attempts matching humming to polyphonic audio exist. In this contribution we introduce hidden Markov models as an alternative for humming queries matching humming itself, mobile phone ring tones and polyphonic audio. The second object of our research is the introduction of a new way of melody enhancement prior to a latter feature extraction by use of stereophonic information. Further an adaptation throughout the extraction process of the frame length to the tempo of a musical piece helps improving similarity matching performance. The paper addresses the design of a working recognition engine and results achieved with respect to the alluded methods. A test database consisting of polyphonic audio clips, ring tones, and sung user data is described in detail.
AB - Music retrieval methods are in the focus of recent interest due to the increasing size of music databases as e.g. in the Internet. Among different query methods content-based media retrieval analyzing intrinsic characteristics of the source seems to form the most intuitive access. The key-melody in a song can be regarded as the major characteristic in music and leads to a query by humming or singing. In this paper we turn our attention to both, the features and the algorithm of matching in audio music retrieval. Nowadays approaches propagate the use of dynamic time warping for the matching process. As reference mostly midi-data or humming itself is used. However, first attempts matching humming to polyphonic audio exist. In this contribution we introduce hidden Markov models as an alternative for humming queries matching humming itself, mobile phone ring tones and polyphonic audio. The second object of our research is the introduction of a new way of melody enhancement prior to a latter feature extraction by use of stereophonic information. Further an adaptation throughout the extraction process of the frame length to the tempo of a musical piece helps improving similarity matching performance. The paper addresses the design of a working recognition engine and results achieved with respect to the alluded methods. A test database consisting of polyphonic audio clips, ring tones, and sung user data is described in detail.
UR - http://www.scopus.com/inward/record.url?scp=84863404984&partnerID=8YFLogxK
U2 - 10.1109/ICME.2003.1221716
DO - 10.1109/ICME.2003.1221716
M3 - Conference contribution
AN - SCOPUS:84863404984
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
SP - II713-II716
BT - Proceedings - 2003 International Conference on Multimedia and Expo, ICME
PB - IEEE Computer Society
T2 - 2003 International Conference on Multimedia and Expo, ICME 2003
Y2 - 6 July 2003 through 9 July 2003
ER -