"The Godfather" vs. "Chaos": Comparing linguistic analysis based on on-line knowledge sources and bags-of-N-grams for movie review valence estimation

Björn Schuller, Joachim Schenk, Gerhard Rigoll, Tobias Knaup

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

In the fields of sentiment and emotion recognition, bag of words modeling has lately become popular for the estimation of valence in text. A typical application is the evaluation of reviews of e. g. movies, music, or games. In this respect we suggest the use of back-off N-Grams as basis for a vector space construction in order to combine advantages of word-order modeling and easy integration into potential acoustic feature vectors intended for spoken-document retrieval. For a fine granular estimate we consider data-driven regression next to classification based on Support Vector Machines. Alternatively the on-line knowledge sources ConceptNet, General Inquirer, and WordNet not only serve to reduce out-of-vocabulary events, but also as basis for a purely linguistic analysis. As special benefit, this approach does not demand labeled training data. A large set of 100 k movie reviews of 20 years stemming from Metacritic is utilized throughout extensive parameter discussion and comparative evaluation effectively demonstrating efficiency of the proposed methods.

Original languageEnglish
Title of host publicationICDAR2009 - 10th International Conference on Document Analysis and Recognition
Pages858-862
Number of pages5
DOIs
StatePublished - 2009
EventICDAR2009 - 10th International Conference on Document Analysis and Recognition - Barcelona, Spain
Duration: 26 Jul 200929 Jul 2009

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
ISSN (Print)1520-5363

Conference

ConferenceICDAR2009 - 10th International Conference on Document Analysis and Recognition
Country/TerritorySpain
CityBarcelona
Period26/07/0929/07/09

Fingerprint

Dive into the research topics of '"The Godfather" vs. "Chaos": Comparing linguistic analysis based on on-line knowledge sources and bags-of-N-grams for movie review valence estimation'. Together they form a unique fingerprint.

Cite this