Title
Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles
Abstract
Herein we present a comparison of novel concepts for a robust fusion of prosodic and verbal cues in speech emotion recognition. Thereby 276 acoustic features are extracted out of a spoken phrase. For linguistic content analysis we use the Bag-of-Words text representation. This allows for integration of acoustic and linguistic features within one vector prior to a final classification. Extensive feature selection by filter- and wrapper based methods is fulfilled. Likewise optimal sets via SVM-SFFS and single feature relevance by information gain ratio calculation are presented. Overall classification is realised by diverse ensemble approaches. Among base classifiers Kernel Machines, Decision Trees, Bayesian classifiers, and memory-based learners are found. Acoustics only tests ran on a database comprising 39 speakers for speaker independent accuracy analysis. Additionally the public Berlin Emotional Speech database is used. A further database of 4,221 movie related phrases forms the basis of acoustic and linguistic information analysis evaluation. Overall remarkable performance in the discrimination of seven discrete emotions could be observed.
Year
Venue
Keywords
2005
INTERSPEECH
information analysis,kernel machine,decision tree,content analysis,information gain,feature selection,bag of words,bayesian classifier
Field
DocType
Citations 
Rule-based machine translation,Decision tree,Feature selection,Computer science,Phrase,Natural language processing,Artificial intelligence,Kernel (linear algebra),Content analysis,Pattern recognition,Speech recognition,Information gain ratio,Linguistics,Bayesian probability
Conference
46
PageRank 
References 
Authors
2.81
5
4
Name
Order
Citations
PageRank
Björn Schuller16749463.50
Ronald Müller217411.03
Manfred K. Lang314111.94
Gerhard Rigoll42788268.87