Abstract | ||
---|---|---|
This work proposes a method to improve the performance of automatic phonetic alignment of speech data. The method uses a deep convolutional neural network (CNN) trained on a combination of acoustic features extracted from labeled data to fine tune the position of each boundary within a fixed-size window around the original boundary position. The proposed method is robust to speaker identity, which means that a system trained with enough labeled data can be used to fine tune alignment on any speech file, regardless of speaker identity. With an absolute gain between 20% and 33% in cross speaker scenario, our results demonstrate the applicability of deep learning for this task. |
Year | Venue | Field |
---|---|---|
2018 | PROPOR | Pattern recognition,Convolutional neural network,Segmentation,Computer science,Absolute gain,Artificial intelligence,Labeled data,Deep learning,Deep neural networks |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
5 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Luis Gustavo D. Cuozzo | 1 | 0 | 0.34 |
Diego Augusto Silva | 2 | 0 | 0.34 |
Mario Uliani Neto | 3 | 2 | 0.71 |
Flávio Olmos Simões | 4 | 0 | 0.34 |
Edson Jose Nagle | 5 | 0 | 0.34 |