Forced Phonetic Alignment in Brazilian Portuguese Using Time-Delay Neural Networks - Citegraph

Paper Info

Title
Forced Phonetic Alignment in Brazilian Portuguese Using Time-Delay Neural Networks

Abstract
Forced phonetic alignment (FPA) is the task of assessing the time boundaries of phonetic units, i.e., calculating when in the speech utterance a certain phoneme starts and ends. This paper describes experiments on FPA for Brazilian Portuguese using Kaldi toolkit. Based on time-delay neural networks (TDNN), several acoustic models were trained on the top of the combination between hidden Markov models (HMM) and Gaussian mixture models (GMM). The nature of the input features and the topology of the HMMs have been varied in order to analyze each one's influence. Results with respect to the phone boundary metric over a dataset of 385 hand-aligned utterances show that the network is mostly invariant to the input features, while regular HMM topologies do perform better in comparison to a modified version used in chain models. Conversely, the neural network still does not outperform GMM models for phonetic alignment.

Year	DOI	Venue
2022	10.1007/978-3-030-98305-5_30	COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022
Keywords	DocType	Volume
Forced phonetic alignment, Speech segmentation, Acoustic modeling, Kaldi, Brazilian Portuguese	Conference	13208
ISSN	Citations	PageRank
0302-9743	0	0.34
References	Authors
0	2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Cassio Batista	1	0	0.34
Nelson Neto	2	0	0.34

1