Phonetic segmentation of speech using STEP and t-SNE - Citegraph

Paper Info

Title
Phonetic segmentation of speech using STEP and t-SNE

Abstract
This paper introduces a first attempt to perform phoneme-level segmentation of speech based on a perceptual representation - the Spectro Temporal Excitation Pattern (STEP) - and a dimensionality reduction technique - the t-Distributed Stochastic Neighbour Embedding (t-SNE). The method searches for the true phonetic boundaries in the vicinity of those produced by an HMM-based segmentation. It looks for perceptually-salient spectral changes which occur at these phonetic transitions, and exploits t-SNE's ability to capture both local and global structure of the data. The method is intended to be used in any language and it is therefore not tailored to any particular dataset or language. Results show that this simple approach improves segmentation accuracy of unvoiced phonemes by 4% within a 5 ms margin, and 5% at a 10 ms margin. For the voiced phonemes, however, accuracy drops slightly.

Year	DOI	Venue
2015	10.1109/SPED.2015.7343105	2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)
Keywords	Field	DocType
phonetic segmentation,STEP,t-SNE,HMM acoustic model,k-Means	Scale-space segmentation,Embedding,Global structure,Dimensionality reduction,Pattern recognition,Computer science,Segmentation,Stochastic process,Speech recognition,Artificial intelligence,Hidden Markov model	Conference
Citations	PageRank	References
0	0.34	9
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (9 rows)

Name	Order	Citations	PageRank
Adriana Stan	1	36	7.23
Cassia Valentini-Botinhao	2	208	18.41
Mircea Giurgiu	3	11	5.19
Simon King	4	19	5.11

1