Phonetics embedding learning with side information - Citegraph

Paper Info

Title
Phonetics embedding learning with side information

Abstract
We show that it is possible to learn an efficient acoustic model using only a small amount of easily available word-level similarity annotations. In contrast to the detailed phonetic labeling required by classical speech recognition technologies, the only information our method requires are pairs of speech excerpts which are known to be similar (same word) and pairs of speech excerpts which are known to be different (different words). An acoustic model is obtained by training shallow and deep neural networks, using an architecture and a cost function well-adapted to the nature of the provided information. The resulting model is evaluated in an ABX minimal-pair discrimination task and is shown to perform much better (11.8% ABX error rate) than raw speech features (19.6%), not far from a fully supervised baseline (best neural network: 9.2%, HMM-GMM: 11%).

Year	DOI	Venue
2014	10.1109/SLT.2014.7078558	Spoken Language Technology Workshop
Keywords	Field	DocType
learning (artificial intelligence),neural nets,speech processing,speech recognition,ABX minimal-pair discrimination task,acoustic model,classical speech recognition technologies,deep neural network training,phonetic embedding learning,shallow neural network training,side information,speech excerpts,word-level similarity annotations,ABX,acoustic model,deep neural network,semi-supervised,side information,speech,speech embeddings	Speech corpus,Speech analytics,Computer science,Voice activity detection,Word error rate,Phonetics,ABX test,Speech recognition,Time delay neural network,Natural language processing,Artificial intelligence,Acoustic model	Conference
ISSN	Citations	PageRank
2639-5479	16	0.87
References	Authors
8	3

Authors (3 rows)

Cited by (16 rows)

References (8 rows)

Name	Order	Citations	PageRank
Gabriel Synnaeve	1	240	16.91
Thomas Schatz	2	56	2.44
Emmanuel Dupoux	3	238	37.33

1