DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon. - Citegraph

Paper Info

Title
DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon.

Abstract
Finding word boundaries in continuous speech is challenging as there is little or no equivalent of a 'space' delimiter between words. Popular Bayesian non-parametric models for text segmentation use a Dirichlet process to jointly segment sentences and build a lexicon of word types. We introduce DP-Parse, which uses similar principles but only relies on an instance lexicon of word tokens, avoiding the clustering errors that arise with a lexicon of word types. On the Zero Resource Speech Benchmark 2017, our model sets a new speech segmentation state-of-the-art in 5 languages. The algorithm monotonically improves with better input representations, achieving yet higher scores when fed with weakly supervised inputs. Despite lacking a type lexicon, DP-Parse can be pipelined to a language model and learn semantic and syntactic representations as assessed by a new spoken word embedding benchmark.

Year	Venue	DocType
2022	Transactions of the Association for Computational Linguistics	Journal
Volume	ISSN	Citations
10	2307-387X	0
PageRank	References	Authors
0.34	0	8

Authors (8 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Robin Algayres	1	2	0.68
Tristan Ricoul	2	0	0.34
Julien Karadayi	3	0	0.34
Hugo Laurençon	4	0	0.34
Salah Zaiem	5	2	1.03
Abdelrahman Mohamed	6	15	1.70
beno it sagot	7	326	49.52
Emmanuel Dupoux	8	238	37.33

1