TY - GEN
T1 - Recent developments in openSMILE, the munich open-source multimedia feature extractor
AU - Eyben, Florian
AU - Weninger, Felix
AU - Gross, Florian
AU - Schuller, Björn
PY - 2013
Y1 - 2013
N2 - We present recent developments in the openSMILE feature extraction toolkit. Version 2.0 now unites feature extraction paradigms from speech, music, and general sound events with basic video features for multi-modal processing. Descriptors from audio and video can be processed jointly in a single framework allowing for time synchronization of parameters, on-line incremental processing as well as off-line and batch processing, and the extraction of statistical functionals (feature summaries), such as moments, peaks, regression parameters, etc. Postprocessing of the features includes statistical classifiers such as support vector machine models or file export for popular toolkits such as Weka or HTK. Available low-level descriptors include popular speech, music and video features including Mel-frequency and similar cepstral and spectral coefficients, Chroma, CENS, auditory model based loudness, voice quality, local binary pattern, color, and optical ow histograms. Besides, voice activity detection, pitch tracking and face detection are supported. openSMILE is implemented in C++, using standard open source libraries for on-line audio and video input. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. openSMILE 2.0 is distributed under a research license and can be downloaded from http://opensmile.sourceforge.net/.
AB - We present recent developments in the openSMILE feature extraction toolkit. Version 2.0 now unites feature extraction paradigms from speech, music, and general sound events with basic video features for multi-modal processing. Descriptors from audio and video can be processed jointly in a single framework allowing for time synchronization of parameters, on-line incremental processing as well as off-line and batch processing, and the extraction of statistical functionals (feature summaries), such as moments, peaks, regression parameters, etc. Postprocessing of the features includes statistical classifiers such as support vector machine models or file export for popular toolkits such as Weka or HTK. Available low-level descriptors include popular speech, music and video features including Mel-frequency and similar cepstral and spectral coefficients, Chroma, CENS, auditory model based loudness, voice quality, local binary pattern, color, and optical ow histograms. Besides, voice activity detection, pitch tracking and face detection are supported. openSMILE is implemented in C++, using standard open source libraries for on-line audio and video input. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. openSMILE 2.0 is distributed under a research license and can be downloaded from http://opensmile.sourceforge.net/.
KW - Audio features
KW - Multimodal fusion
KW - Real-time processing
KW - Video features
UR - http://www.scopus.com/inward/record.url?scp=84887494391&partnerID=8YFLogxK
U2 - 10.1145/2502081.2502224
DO - 10.1145/2502081.2502224
M3 - Conference contribution
AN - SCOPUS:84887494391
SN - 9781450324045
T3 - MM 2013 - Proceedings of the 2013 ACM Multimedia Conference
SP - 835
EP - 838
BT - MM 2013 - Proceedings of the 2013 ACM Multimedia Conference
T2 - 21st ACM International Conference on Multimedia, MM 2013
Y2 - 21 October 2013 through 25 October 2013
ER -