Title
Blind speech segmentation using spectrogram image-based features and Mel cepstral coefficients
Abstract
This paper introduces a novel method for blind speech segmentation at a phone level based on image processing. We consider the spectrogram of the waveform of an utterance as an image and hypothesize that its striping defects, i.e. discontinuities, appear due to phone boundaries. Using a simple image destriping algorithm these discontinuities are found. To discover phone transitions which are not as salient in the image, we compute spectral changes derived from the time evolution of Mel cepstral parametrisation of speech. These socalled image-based and acoustic features are then combined to form a mixed probability function, whose values indicate the likelihood of a phone boundary being located at the corresponding time frame. The method is completely unsupervised and achieves an accuracy of 75.59% at a -3.26% over-segmentation rate, yielding an F-measure of 0.76 and an 0.80 R-value on the TIMIT dataset.
Year
DOI
Venue
2016
10.1109/SLT.2016.7846324
2016 IEEE Spoken Language Technology Workshop (SLT)
Keywords
Field
DocType
blind segmentation,unsupervised segmentation,phoneme segmentation,destriping,image processing
Mel-frequency cepstrum,TIMIT,Speech processing,Pattern recognition,Spectrogram,Computer science,Image processing,Speech recognition,Feature extraction,Image segmentation,Artificial intelligence,Speech segmentation
Conference
ISSN
ISBN
Citations 
2639-5479
978-1-5090-4904-2
1
PageRank 
References 
Authors
0.39
0
4
Name
Order
Citations
PageRank
Adriana Stan1367.23
Cassia Valentini-Botinhao220818.41
Bogdan Orza334.74
Mircea Giurgiu4115.19