Abstract | ||
---|---|---|
Vowel durations are most often utilized in studies addressing specific issues in phonetics. Thus far this has been hampered by a reliance on subjective, labor-intensive manual annotation. Our goal is to build an algorithm for automatic accurate measurement of vowel duration, where the input to the algorithm is a speech segment contains one vowel preceded and followed by consonants (CVC). Our algorithm is based on a deep neural network trained at the frame level on manually annotated data from a phonetic study. Specifically, we try two deep-network architectures: convolutional neural network (CNN), and deep belief network (DBN), and compare their accuracy to an HMM-based forced aligner. Results suggest that CNN is better than DBN, and both CNN and HMM-based forced aligner are comparable in their results, but neither of them yielded the same predictions as models fit to manually annotated data. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1109/MLSP.2015.7324331 | 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP) |
Keywords | Field | DocType |
vowel duration measurement,convolution neural networks,deep belief networks,hidden Markov models,forced alignment | Convolutional neural network,Computer science,Deep belief network,Phonetics,Time delay neural network,Artificial intelligence,Deep learning,Artificial neural network,Pattern recognition,Speech recognition,Vowel,Hidden Markov model,Machine learning | Conference |
Volume | ISSN | Citations |
2015 | 1551-2541 | 1 |
PageRank | References | Authors |
0.44 | 2 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yossi Adi | 1 | 10 | 2.64 |
Joseph Keshet | 2 | 925 | 69.84 |
Matthew Goldrick | 3 | 12 | 3.19 |