Abstract | ||
---|---|---|
Accurate and automatic phonetic segmentation is crucial for several speech based applications such as phone level articulation analysis and error detection, speech synthesis, annotation, speech recognition and emotion recognition. In this paper we examine the effectiveness of using visual features obtained by processing the image spectrogram of a speech utterance, as applied to phonetic segmentation. Further, we propose a mechanism to combine the knowledge from visual and perceptual domains for automatic phonetic segmentation. This process can be considered analogous to manual phonetic segmentation. The technique was evaluated on TIMIT American English Corpus. Experimental results show significant improvements in phonetic segmentation, especially for lower tolerances of 5, 10 and 15 ms, with an absolute improvement of 8.29% for TIMIT database for a 10 ms tolerance is observed. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1007/978-3-319-64206-2_44 | Lecture Notes in Artificial Intelligence |
Keywords | Field | DocType |
Unsupervised phonetic segmentation,Edge detection,Multi-taper,Visual phonetic segmentation | TIMIT,Speech synthesis,Segmentation,Spectrogram,Edge detection,Computer science,Utterance,Speech recognition,Phone,American English,Natural language processing,Artificial intelligence | Conference |
Volume | ISSN | Citations |
10415 | 0302-9743 | 0 |
PageRank | References | Authors |
0.34 | 11 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bhavik B. Vachhani | 1 | 22 | 4.69 |
Chitralekha Bhat | 2 | 2 | 3.13 |
Sunil Kumar Kopparapu | 3 | 42 | 25.18 |