Title | ||
---|---|---|
Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization |
Abstract | ||
---|---|---|
Direct-path relative transfer function (DP-RTF) refers to the ratio between the direct-path acoustic transfer functions of two microphone channels. Though DP-RTF fully encodes the sound spatial cues and serves as a reliable localization feature, it is often erroneously estimated in the presence of noise and reverberation. This paper proposes to learn DP-RTF with deep neural networks for robust binaural sound source localization. A DP-RTF learning network is designed to regress the binaural sensor signals to a real-valued representation of DP-RTF. It consists of a branched convolutional neural network module to separately extract the inter-channel magnitude and phase patterns, and a convolutional recurrent neural network module for joint feature learning. To better explore the speech spectra to aid the DP-RTF estimation, a monaural speech enhancement network is used to recover the direct-path spectrograms from the noisy ones. The enhanced spectrograms are stacked onto the noisy spectrograms to act as the input of the DP-RTF learning network. We train one unique DP-RTF learning network using many different binaural arrays to enable the generalization of DP-RTF learning across arrays. This way avoids time-consuming training data collection and network retraining for a new array, which is very useful in practical application. Experimental results on both simulated and real-world data show the effectiveness of the proposed method for direction of arrival (DOA) estimation in the noisy and reverberant environment, and a good generalization ability to unseen binaural arrays. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/TASLP.2021.3120641 | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING |
Keywords | DocType | Volume |
Location awareness, Feature extraction, Arrays, Speech enhancement, Spectrogram, Deep learning, Transfer functions, Direct-path relative transfer function, sound source localization, direction of arrival, deep neural network | Journal | 29 |
Issue | ISSN | Citations |
1 | 2329-9290 | 0 |
PageRank | References | Authors |
0.34 | 14 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bing Yang | 1 | 44 | 8.37 |
Hong Liu | 2 | 747 | 82.65 |
Xiaofei Li | 3 | 103 | 24.78 |