Title
Frequency Axis Pooling Method for Weakly Labeled Sound Event Detection and Classification
Abstract
Recently, the convolutional recurrent neural network (CRNN) has been widely used in weakly labeled sound event detection (SED) and audio tagging (AT) tasks. However, it is possible that the information of frequency dimension is not well used in the existing network design, which may cause information loss or redundancy. We propose a frequency axis pooling method to further boost the representation power of CRNN. Based on the existing pooling functions, the frequency axis pooling is applied on the feature map before recurrent neural network (RNN) input in CRNN. Compared to frequency axis no-pooling method, our method assigns different weights to different frequency dimensions during compressing, which can better compress frequency information and reduce information redundancy. To evaluate the proposed method, three commonly used pooling functions on frequency axis are compared on the Dcase2017 task4 dataset. The experimental results show that reasonable compression of frequency information helps to improve the performance of AT and SED tasks significantly. Among them, the frequency axis pooling based on linear softmax performs the best on both tasks.
Year
Venue
DocType
2021
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)
Conference
ISSN
Citations 
PageRank 
2309-9402
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Miao Liu100.34
Jing Wang203.72
Yujun Wang300.34
Lidong Yang400.34