Title
Using Representation Learning And Out-Of-Domain Data For A Paralinguistic Speech Task
Abstract
In this work, we study the paralinguistic speech task of eating condition classification and present our submitted classification system for the INTERSPEECH 2015 Computational Paralinguistics challenge. We build upon a deep learning language identification system, which we repurpose for general audio sequence classification. The main idea is that we train local convolutional neural network classifiers that automatically learn representations on smaller windows of the full sequence's spectrum and to aggregate multiple local classifications towards a full sequence classification. A particular challenge of the task is training data scarcity and the resulting overfitting of neural network methods, which we tackle with dropout, synthetic data augmentation and transfer learning with out-of-domain data from a language identification task. Our final submitted system achieved an UAR score of 75.9% for 7-way eating condition classification, which is a relative improvement of 15% over the baseline.
Year
Venue
Keywords
2015
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5
speech classification, computational paralinguistics, neural networks, deep learning, transfer learning, data augmentation
Field
DocType
Citations 
Paralanguage,Computer science,Speech recognition,Artificial intelligence,Natural language processing,Feature learning
Conference
5
PageRank 
References 
Authors
0.41
13
2
Name
Order
Citations
PageRank
Benjamin Milde1425.20
Chris Biemann279186.25