Title | ||
---|---|---|
An Audio Scene Classification Framework With Embedded Filters And A Dct-Based Temporal Module |
Abstract | ||
---|---|---|
Deep convolutional neural network (DCNN) has recently improved the performance of acoustic scene classification. However, the input features of the network are usually based on predefined hand-tailored filters, which may not apply to the specific tasks. To overcome this, we propose a hybrid framework that jointly trains the front-end filters and the back-end DCNN. Also, a novel temporal module based on the discrete cosine transform (DCT) is inserted after the high-level feature map of the network, thus enabling us to utilize time information without a reduction of training samples. Our single system, composed of the fine-tuned wavelet front-end and the DCNN back-end, with the integrated DCT-based temporal module, has achieved an accuracy of 79:20% in the evaluation set in DCASE17, gaining around 3% and 8% accuracy improvement compared with scalogram-DCNN and FBank-DCNN systems, respectively. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/icassp.2019.8683636 | 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) |
Keywords | Field | DocType |
Acoustic scene classification, embedded filters, joint-training, DCT-based temporal module | Pattern recognition,Computer science,Convolutional neural network,Discrete cosine transform,Artificial intelligence,Wavelet | Conference |
ISSN | Citations | PageRank |
1520-6149 | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hangting Chen | 1 | 2 | 2.39 |
Pengyuan Zhang | 2 | 50 | 19.46 |
Yonghong Yan | 3 | 10 | 6.40 |