Title
Weakly supervised CRNN system for sound event detection with large-scale unlabeled in-domain data.
Abstract
Sound event detection (SED) is typically posed as a supervised learning problem requiring training data with strong temporal labels of sound events. However, the production of datasets with strong labels normally requires unaffordable labor cost. It limits the practical application of supervised SED methods. The recent advances in SED approaches focuses on detecting sound events by taking advantages of weakly labeled or unlabeled training data. In this paper, we propose a joint framework to solve the SED task using large-scale unlabeled in-domain data. In particular, a state-of-the-art general audio tagging model is first employed to predict weak labels for unlabeled data. On the other hand, a weakly supervised architecture based on the convolutional recurrent neural network (CRNN) is developed to solve the strong annotations of sound events with the aid of the unlabeled data with predicted labels. It is found that the SED performance generally increases as more unlabeled data is added into the training. To address the noisy label problem of unlabeled data, an ensemble strategy is applied to increase the system robustness. The proposed system is evaluated on the SED dataset of DCASE 2018 challenge. It reaches a F1-score of 21.0%, resulting in an improvement of 10% over the baseline system.
Year
Venue
DocType
2018
arXiv: Sound
Journal
Volume
Citations 
PageRank 
abs/1811.00301
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
De-Zhi Wang1106.06
Lilun Zhang2757.84
Changchun Bao300.34
Kele Xu44621.80
Boqing Zhu552.47
Qiuqiang Kong66818.75