Title
Time-Frequency Kernel-Based Cnn For Speech Recognition
Abstract
We propose a novel approach to generate time-frequency kernel based deep convolutional neural networks (CNN) for robust speech recognition. We give different treatments to shifting along the time and the frequency axes of speech feature representations in the 2D convolution, so as to achieve certain invariance in small frequency shifts while expanding time context size for speech input without smearing time positions of phone segments. The 20-kernel approach allows easy implementation of deep CNNs. We present experimental results on speaker-independent phone recognition tasks of TIMIT and FFMTIMIT. where the latter was acquired using a far-field microphone and the speech data are noisy. Our results demonstrate that the proposed time-frequency kernel-based CNN gives consistent phone error reductions over frequency-domain CNN and DNN for both TIMIT and FFMTIMIT, with more benefits shown for recognizing noisy speech by using clean speech models.
Year
Venue
Keywords
2015
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5
time-frequency kernels, convolutional neural network, robust speech recognition
Field
DocType
Citations 
Kernel (linear algebra),Pattern recognition,Computer science,Speech recognition,Artificial intelligence,Time–frequency analysis
Conference
2
PageRank 
References 
Authors
0.35
7
3
Name
Order
Citations
PageRank
Tuo Zhao122240.58
Yunxin Zhao2807121.74
Xin Chen31169.64