Title
Importance of non-uniform prosody modification for speech recognition in emotion conditions.
Abstract
A mismatch in training and operating environments causes a performance degradation in speech recognition systems (ASR). One major reason for this mismatch is due to the presence of expressive (emotive) speech in operational environments. Emotions in speech majorly inflict the changes in the prosody parameters of pitch, duration and energy. This work is aimed at improving the performance of speech recognition systems in the presence of emotive speech. This work focuses on improving the speech recognition performance without disturbing the existing ASR system. The prosody modification of pitch, duration and energy is achieved by tuning the modification factors values for the relative differences between the neutral and emotional data sets. The neutral version of emotive speech is generated using uniform and non-uniform prosody modification methods for speech recognition. During the study, IITKGP-SESC corpus is used for building the ASR system. The speech recognition system for the emotions (anger, happy and compassion) is evaluated. An improvement in the performance of ASR is observed when the prosody modified emotive utterance is used for speech recognition in place of original emotive utterance. An average improvement around 5% in accuracy is observed due to the use of non-uniform prosody modification methods.
Year
Venue
Keywords
2017
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
prosody,automatic speech recognition,IITKGP-SESC,uniform prosody modification,non-uniform prosody modification
Field
DocType
ISSN
Prosody,Computer science,Utterance,Speech recognition,Anger,Speech recognition performance,Emotive,Hidden Markov model
Conference
2309-9402
Citations 
PageRank 
References 
1
0.35
0
Authors
4
Name
Order
Citations
PageRank
V. V. Vidyadhara Raju110.35
hari krishna vydana2164.67
Gangashetty, S.V.3205.71
Anil Kumar Vuppala4275.71