Title
Data Augmentation Strategies For Neural Network F0 Estimation
Abstract
This study explores various speech data augmentation methods for the task of noise-robust fundamental frequency (F0) estimation with neural networks. The explored augmentation strategies are split into additive noise and channel-based augmentation and into vocoder-based augmentation methods. In vocoder-based augmentation, a glottal vocoder is used to enhance the accuracy of ground truth F0 used for training of the neural network, as well as to expand the training data diversity in terms of F0 patterns and vocal tract lengths of the talkers. Evaluations on the PTDB-TUG corpus indicate that noise and channel augmentation can be used to greatly increase the noise robustness of trained models, and that vocoder-based ground truth enhancement further increases model performance. For smaller datasets, vocoder-based diversity augmentation can also be used to increase performance. The best-performing proposed method greatly outperformed the compared F0 estimation methods in terms of noise robustness.
Year
DOI
Venue
2019
10.1109/icassp.2019.8683041
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords
Field
DocType
Speech analysis, F0 estimation, noise robustness, data augmentation, deep learning
Training set,Fundamental frequency,Pattern recognition,Computer science,Communication channel,Robustness (computer science),Ground truth,Artificial intelligence,Deep learning,Artificial neural network,Vocal tract
Conference
ISSN
Citations 
PageRank 
1520-6149
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Manu Airaksinen1345.18
Lauri Juvela2358.29
Paavo Alku372898.07
Okko Johannes Räsänen49914.30