Title
Zero-Shot Single-Microphone Sound Classification and Localization in a Building Via the Synthesis of Unseen Features
Abstract
In this paper, we propose a learning-based approach to identify the type and position of sounds using a single microphone in a real-world building. We attempt to treat this problem as a joint classification problem in which we predict the exact positions of sounds while classifying the types that are assumed to be from pre-defined types of sounds. The most problematic issue is that while the types are readily classified under supervised learning frameworks with one-hot encoded labels, it is difficult to predict the exact positions of the sound from unseen positions during training. To address this potential discrepancy, we formulate the position identification problem as a zero-shot learning problem inspired by the human ability to perceive new concepts from previously learned concepts. We extract feature representations from audio data and vectorize the type and position of the sound source as 'type/position-aware attributes,' instead of labeling each class with a simple one-hot vector. We then train a promising generative model to bridge the extracted features and the attributes by learning the class-invariant structure to transfer the knowledge from seen to unseen classes through their attributes; generative adversarial networks are conditioned on the class-embeddings. Our proposed methods are evaluated on an indoor noise dataset, SNU-B36-EX, a real-world dataset collected inside a building.
Year
DOI
Venue
2022
10.1109/TMM.2021.3079705
IEEE TRANSACTIONS ON MULTIMEDIA
Keywords
DocType
Volume
Location awareness, Microphones, Buildings, Feature extraction, Training, Reverberation, Data models, Generative adversarial network, sound classification, sound source localization, zero-shot learning
Journal
24
ISSN
Citations 
PageRank 
1520-9210
0
0.34
References 
Authors
35
4
Name
Order
Citations
PageRank
Seungjun Lee1216.20
Haesang Yang200.34
Hwiyong Choi300.68
Woojae Seong412.74