Title
Forced Phonetic Alignment in Brazilian Portuguese Using Time-Delay Neural Networks
Abstract
Forced phonetic alignment (FPA) is the task of assessing the time boundaries of phonetic units, i.e., calculating when in the speech utterance a certain phoneme starts and ends. This paper describes experiments on FPA for Brazilian Portuguese using Kaldi toolkit. Based on time-delay neural networks (TDNN), several acoustic models were trained on the top of the combination between hidden Markov models (HMM) and Gaussian mixture models (GMM). The nature of the input features and the topology of the HMMs have been varied in order to analyze each one's influence. Results with respect to the phone boundary metric over a dataset of 385 hand-aligned utterances show that the network is mostly invariant to the input features, while regular HMM topologies do perform better in comparison to a modified version used in chain models. Conversely, the neural network still does not outperform GMM models for phonetic alignment.
Year
DOI
Venue
2022
10.1007/978-3-030-98305-5_30
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022
Keywords
DocType
Volume
Forced phonetic alignment, Speech segmentation, Acoustic modeling, Kaldi, Brazilian Portuguese
Conference
13208
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Cassio Batista100.34
Nelson Neto200.34