Abstract | ||
---|---|---|
Forced phonetic alignment (FPA) is the task of assessing the time boundaries of phonetic units, i.e., calculating when in the speech utterance a certain phoneme starts and ends. This paper describes experiments on FPA for Brazilian Portuguese using Kaldi toolkit. Based on time-delay neural networks (TDNN), several acoustic models were trained on the top of the combination between hidden Markov models (HMM) and Gaussian mixture models (GMM). The nature of the input features and the topology of the HMMs have been varied in order to analyze each one's influence. Results with respect to the phone boundary metric over a dataset of 385 hand-aligned utterances show that the network is mostly invariant to the input features, while regular HMM topologies do perform better in comparison to a modified version used in chain models. Conversely, the neural network still does not outperform GMM models for phonetic alignment. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1007/978-3-030-98305-5_30 | COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022 |
Keywords | DocType | Volume |
Forced phonetic alignment, Speech segmentation, Acoustic modeling, Kaldi, Brazilian Portuguese | Conference | 13208 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Cassio Batista | 1 | 0 | 0.34 |
Nelson Neto | 2 | 0 | 0.34 |