Title | ||
---|---|---|
A Hybrid Dynamic Time Warping-Deep Neural Network Architecture For Unsupervised Acoustic Modeling |
Abstract | ||
---|---|---|
We report on an architecture for the unsupervised discovery of talker-invariant subword embeddings. It is made out of two components: a dynamic-time warping based spoken term discovery (STD) system and a Siamese deep neural network (DNN). The STD system clusters word-sized repeated fragments in the acoustic streams while the DNN is trained to minimize the distance between time aligned frames of tokens of the same cluster, and maximize the distance between tokens of different clusters. We use additional side information regarding the average duration of phonemic units, as well as talker identity tags. For evaluation we use the datasets and metrics of the Zero Resource Speech Challenge. The model shows improvement over the baseline in subword unit modeling. |
Year | Venue | Keywords |
---|---|---|
2015 | 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | zero resource speech challenge, feature extraction, deep learning |
Field | DocType | Citations |
Architecture,Image warping,Pattern recognition,Dynamic time warping,Computer science,Neural network architecture,Side information,Speech recognition,Time delay neural network,Artificial intelligence,Artificial neural network | Conference | 12 |
PageRank | References | Authors |
0.54 | 10 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Roland Thiollière | 1 | 18 | 1.97 |
Ewan Dunbar | 2 | 71 | 5.08 |
Gabriel Synnaeve | 3 | 240 | 16.91 |
Maarten Versteegh | 4 | 49 | 3.25 |
Emmanuel Dupoux | 5 | 238 | 37.33 |