Title
Adaptive Hierarchical Pooling for Weakly-supervised Sound Event Detection
Abstract
ABSTRACTIn Weakly-supervised Sound Event Detection (WSED), the ground truth of training data contains the presence or absence of each sound event only at the clip-level (i.e., no frame-level annotations). Recently, WSED has been formulated under the multi-instance learning framework, and a critical component within this formulation is the design of the temporal pooling function. In this paper, we propose an adaptive hierarchical pooling (HiPool) for WSED, which combines the advantages of max pooling in audio tagging and weighted average pooling in audio localization through a novel hierarchical structure and learns event-wise optimal pooling functions through continuous relaxation-based joint optimization. Extensive experiments on benchmark datasets show that HiPool outperforms the current pooling methods and greatly improves the performance of WSED. HiPool also has great generality - ready to be plugged into any WSED models.
Year
DOI
Venue
2022
10.1145/3503161.3548097
International Multimedia Conference
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Lijian Gao100.34
Ling Zhou200.34
Qirong Mao326134.29
Ming Dong400.34