Adaptive Hierarchical Pooling for Weakly-supervised Sound Event Detection - Citegraph

Paper Info

Title
Adaptive Hierarchical Pooling for Weakly-supervised Sound Event Detection

Abstract
ABSTRACTIn Weakly-supervised Sound Event Detection (WSED), the ground truth of training data contains the presence or absence of each sound event only at the clip-level (i.e., no frame-level annotations). Recently, WSED has been formulated under the multi-instance learning framework, and a critical component within this formulation is the design of the temporal pooling function. In this paper, we propose an adaptive hierarchical pooling (HiPool) for WSED, which combines the advantages of max pooling in audio tagging and weighted average pooling in audio localization through a novel hierarchical structure and learns event-wise optimal pooling functions through continuous relaxation-based joint optimization. Extensive experiments on benchmark datasets show that HiPool outperforms the current pooling methods and greatly improves the performance of WSED. HiPool also has great generality - ready to be plugged into any WSED models.

Year	DOI	Venue
2022	10.1145/3503161.3548097	International Multimedia Conference
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Lijian Gao	1	0	0.34
Ling Zhou	2	0	0.34
Qirong Mao	3	261	34.29
Ming Dong	4	0	0.34

1