Title | ||
---|---|---|
Using Representation Learning And Out-Of-Domain Data For A Paralinguistic Speech Task |
Abstract | ||
---|---|---|
In this work, we study the paralinguistic speech task of eating condition classification and present our submitted classification system for the INTERSPEECH 2015 Computational Paralinguistics challenge. We build upon a deep learning language identification system, which we repurpose for general audio sequence classification. The main idea is that we train local convolutional neural network classifiers that automatically learn representations on smaller windows of the full sequence's spectrum and to aggregate multiple local classifications towards a full sequence classification. A particular challenge of the task is training data scarcity and the resulting overfitting of neural network methods, which we tackle with dropout, synthetic data augmentation and transfer learning with out-of-domain data from a language identification task. Our final submitted system achieved an UAR score of 75.9% for 7-way eating condition classification, which is a relative improvement of 15% over the baseline. |
Year | Venue | Keywords |
---|---|---|
2015 | 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | speech classification, computational paralinguistics, neural networks, deep learning, transfer learning, data augmentation |
Field | DocType | Citations |
Paralanguage,Computer science,Speech recognition,Artificial intelligence,Natural language processing,Feature learning | Conference | 5 |
PageRank | References | Authors |
0.41 | 13 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Benjamin Milde | 1 | 42 | 5.20 |
Chris Biemann | 2 | 791 | 86.25 |