Acoustic Feature Comparison For Different Speaking Rates - Citegraph

Paper Info

Title
Acoustic Feature Comparison For Different Speaking Rates

Abstract
This paper investigates the effect of speaking rate variation on the task of frame classification. This task is indicative of the performance on phoneme and word recognition and is a first step towards designing voice-controlled interfaces. Different speaking rates cause different dynamics. For example, speaking rate variations will cause changes both in formant frequencies and in their transition tracks. A word spoken at normal speed gets recognized more often than the same word spoken by the same speaker at a much faster or slower pace, or vice-versa. It is thus imperative to design interfaces which take into account different speaking variabilities. To better incorporate speaker variability into digital devices, we study the effect of (a) feature selection and (b) the choice of network architecture on variable speaking rates. Four different features are evaluated on multiple configurations of Deep Neural Network (DNN) architectures. The findings show that log Filter-Bank Energies (FBE) outperformed the other acoustic features not only on normal speaking rate but for slow and fast speaking rates as well.

Year	DOI	Venue
2018	10.1007/978-3-319-91250-9_14	HUMAN-COMPUTER INTERACTION: INTERACTION TECHNOLOGIES, HCI INTERNATIONAL 2018, PT III
Keywords	Field	DocType
Intrinsic variations, Speaking rate, Acoustic features, FBE, MFCC, DNN	Mel-frequency cepstrum,Pace,Feature selection,Computer science,Word recognition,Network architecture,Speech recognition,Human–computer interaction,Artificial neural network,Formant	Conference
Volume	ISSN	Citations
10903	0302-9743	0
PageRank	References	Authors
0.34	14	4

Authors (4 rows)

Cited by (0 rows)

References (14 rows)

Name	Order	Citations	PageRank
Abdolreza Sabzi Shahrebabaki	1	1	3.41
Ali Shariq Imran	2	49	17.47
Negar Olfati	3	1	2.40
Torbjørn Svendsen	4	161	21.26

1