Title
An Audio Scene Classification Framework With Embedded Filters And A Dct-Based Temporal Module
Abstract
Deep convolutional neural network (DCNN) has recently improved the performance of acoustic scene classification. However, the input features of the network are usually based on predefined hand-tailored filters, which may not apply to the specific tasks. To overcome this, we propose a hybrid framework that jointly trains the front-end filters and the back-end DCNN. Also, a novel temporal module based on the discrete cosine transform (DCT) is inserted after the high-level feature map of the network, thus enabling us to utilize time information without a reduction of training samples. Our single system, composed of the fine-tuned wavelet front-end and the DCNN back-end, with the integrated DCT-based temporal module, has achieved an accuracy of 79:20% in the evaluation set in DCASE17, gaining around 3% and 8% accuracy improvement compared with scalogram-DCNN and FBank-DCNN systems, respectively.
Year
DOI
Venue
2019
10.1109/icassp.2019.8683636
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords
Field
DocType
Acoustic scene classification, embedded filters, joint-training, DCT-based temporal module
Pattern recognition,Computer science,Convolutional neural network,Discrete cosine transform,Artificial intelligence,Wavelet
Conference
ISSN
Citations 
PageRank 
1520-6149
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Hangting Chen122.39
Pengyuan Zhang25019.46
Yonghong Yan3106.40