Title
On Learning Disentangled Representation for Acoustic Event Detection
Abstract
Polyphonic Acoustic Event Detection (AED) is a challenging task as the sounds are mixed with the signals from different events, and the features extracted from the mixture do not match well with features calculated from sounds in isolation, leading to suboptimal AED performance. In this paper, we propose a supervised β-VAE model for AED, which adds a novel event-specific disentangling loss in the objective function of disentangled learning. By incorporating either latent factor blocks or latent attention in disentangling, supervised β-VAE learns a set of discriminative features for each event. Extensive experiments on benchmark datasets show that our approach outperforms the current state-of-the-arts (top-1 performers in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 AED challenge). Supervised β-VAE has great success in challenging AED tasks with a large variety of events and imbalanced data.
Year
DOI
Venue
2019
10.1145/3343031.3351086
Proceedings of the 27th ACM International Conference on Multimedia
Keywords
Field
DocType
acoustic event detection, disentangled latent representation, supervised variational autoencoder
Computer vision,Computer science,Speech recognition,Artificial intelligence,Acoustic event detection,Discriminative model
Conference
ISBN
Citations 
PageRank 
978-1-4503-6889-6
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Lijian Gao101.01
Qirong Mao226134.29
Ming Dong384949.17
Yu Jing400.34
Ratna Babu Chinnam521018.59