Abstract | ||
---|---|---|
We study the problem of acoustic feature learning in the setting where we have access to another (non-acoustic) modality for feature learning but not at test time. We use deep variational canonical correlation analysis (VCCA), a recently proposed deep generative method for multi-view representation learning. We also extend VCCA with improved latent variable priors and with adversarial learning. Compared to other techniques for multi-view feature learning, VCCA's advantages include an intuitive latent variable interpretation and a variational lower bound objective that can be trained end-to-end efficiently. We compare VCCA and its extensions with previous feature learning methods on the University of Wisconsin X-ray Microbeant Database, and show that VCCA-based feature learning improves over previous methods for speaker-independent phonetic recognition. |
Year | DOI | Venue |
---|---|---|
2017 | 10.21437/Interspeech.2017-1581 | 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION |
Keywords | DocType | Volume |
multi-view learning, acoustic features, canonical correlation analysis, variational methods, adversarial learning | Conference | abs/1708.04673 |
ISSN | Citations | PageRank |
2308-457X | 1 | 0.37 |
References | Authors | |
16 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Qingming Tang | 1 | 16 | 4.60 |
Weiran Wang | 2 | 17 | 2.06 |
Karen Livescu | 3 | 1254 | 71.43 |