Learning filter banks within a deep neural network framework - Citegraph

Paper Info

Title
Learning filter banks within a deep neural network framework

Abstract
Mel-filter banks are commonly used in speech recognition, as they are motivated from theory related to speech production and perception. While features derived from mel-filter banks are quite popular, we argue that this filter bank is not really an appropriate choice as it is not learned for the objective at hand, i.e. speech recognition. In this paper, we explore replacing the filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network. Thus, the filter bank is learned to minimize cross-entropy, which is more closely tied to the speech recognition objective. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter bank learning approach, compared to having a fixed set of filters.

Year	DOI	Venue
2013	10.1109/ASRU.2013.6707746	Automatic Speech Recognition and Understanding
Keywords	Field	DocType
channel bank filters,learning (artificial intelligence),neural nets,speech recognition,50-hour English broadcast news task,Mel-filter banks,WER,cross-entropy minimization,deep neural network framework,filter bank learning approach,speech perception,speech production,speech recognition,word error rate	Broadcasting,Pattern recognition,Computer science,Filter bank,Word error rate,Speech recognition,Time delay neural network,Artificial intelligence,Artificial neural network,Perception,Speech production	Conference
Citations	PageRank	References
35	1.41	13
Authors
4

Authors (4 rows)

Cited by (35 rows)

References (13 rows)

Name	Order	Citations	PageRank
Tara N. Sainath	1	3497	232.43
B. Kingsbury	2	4175	335.43
Abdel-rahman Mohamed	3	3772	266.13
Bhuvana Ramabhadran	4	1779	153.83

1