Abstract | ||
---|---|---|
This paper proposes a two-stage deep feed-forward neural network (DNN) to tackle the acoustic-to-articulatory inversion (AAI) problem. DNNs are a viable solution for the AAI task, but the temporal continuity of the estimated articulatory values has not been exploited properly when a DNN is employed. In this work, we propose to address the lack of any temporal constraints while enforcing a parameter-parsimonious solution by deploying a two-stage solution based only on DNNs: (i) Articulatory trajectories are estimated in a first stage using DNN, and (ii) a temporal window of the estimated trajectories is used in a follow-up DNN stage as a refinement. The first stage estimation could be thought of as an auxiliary additional information that poses some constraints on the inversion process. Experimental evidence demonstrates an average error reduction of 7.51% in terms of RMSE compared to the baseline, and an improvement of 2.39% with respect to Pearson correlation is also attained. Finally, we should point out that AAI is still a highly challenging problem, mainly due to the non-linearity of the acousticto-articulatory and one-to-many mapping. It is thus promising that a significant improvement was attained with our simple yet elegant solution. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/ICASSP39728.2021.9413742 | 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) |
Keywords | DocType | Citations |
Acoustic-to-articulatory inversion, deep learning, DNN, FBE | Conference | 0 |
PageRank | References | Authors |
0.34 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Abdolreza Sabzi Shahrebabaki | 1 | 1 | 3.41 |
Negar Olfati | 2 | 1 | 2.40 |
Ali Shariq Imran | 3 | 49 | 17.47 |
Magne Hallstein Johnsen | 4 | 19 | 6.27 |
Sabato Marco Siniscalchi | 5 | 310 | 30.21 |
Torbjørn Svendsen | 6 | 161 | 21.26 |