Blind speech segmentation using spectrogram image-based features and Mel cepstral coefficients - Citegraph

Paper Info

Title
Blind speech segmentation using spectrogram image-based features and Mel cepstral coefficients

Abstract
This paper introduces a novel method for blind speech segmentation at a phone level based on image processing. We consider the spectrogram of the waveform of an utterance as an image and hypothesize that its striping defects, i.e. discontinuities, appear due to phone boundaries. Using a simple image destriping algorithm these discontinuities are found. To discover phone transitions which are not as salient in the image, we compute spectral changes derived from the time evolution of Mel cepstral parametrisation of speech. These socalled image-based and acoustic features are then combined to form a mixed probability function, whose values indicate the likelihood of a phone boundary being located at the corresponding time frame. The method is completely unsupervised and achieves an accuracy of 75.59% at a -3.26% over-segmentation rate, yielding an F-measure of 0.76 and an 0.80 R-value on the TIMIT dataset.

Year	DOI	Venue
2016	10.1109/SLT.2016.7846324	2016 IEEE Spoken Language Technology Workshop (SLT)
Keywords	Field	DocType
blind segmentation,unsupervised segmentation,phoneme segmentation,destriping,image processing	Mel-frequency cepstrum,TIMIT,Speech processing,Pattern recognition,Spectrogram,Computer science,Image processing,Speech recognition,Feature extraction,Image segmentation,Artificial intelligence,Speech segmentation	Conference
ISSN	ISBN	Citations
2639-5479	978-1-5090-4904-2	1
PageRank	References	Authors
0.39	0	4

Authors (4 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Adriana Stan	1	36	7.23
Cassia Valentini-Botinhao	2	208	18.41
Bogdan Orza	3	3	4.74
Mircea Giurgiu	4	11	5.19

1