Title
Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization
Abstract
Direct-path relative transfer function (DP-RTF) refers to the ratio between the direct-path acoustic transfer functions of two microphone channels. Though DP-RTF fully encodes the sound spatial cues and serves as a reliable localization feature, it is often erroneously estimated in the presence of noise and reverberation. This paper proposes to learn DP-RTF with deep neural networks for robust binaural sound source localization. A DP-RTF learning network is designed to regress the binaural sensor signals to a real-valued representation of DP-RTF. It consists of a branched convolutional neural network module to separately extract the inter-channel magnitude and phase patterns, and a convolutional recurrent neural network module for joint feature learning. To better explore the speech spectra to aid the DP-RTF estimation, a monaural speech enhancement network is used to recover the direct-path spectrograms from the noisy ones. The enhanced spectrograms are stacked onto the noisy spectrograms to act as the input of the DP-RTF learning network. We train one unique DP-RTF learning network using many different binaural arrays to enable the generalization of DP-RTF learning across arrays. This way avoids time-consuming training data collection and network retraining for a new array, which is very useful in practical application. Experimental results on both simulated and real-world data show the effectiveness of the proposed method for direction of arrival (DOA) estimation in the noisy and reverberant environment, and a good generalization ability to unseen binaural arrays.
Year
DOI
Venue
2021
10.1109/TASLP.2021.3120641
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Keywords
DocType
Volume
Location awareness, Feature extraction, Arrays, Speech enhancement, Spectrogram, Deep learning, Transfer functions, Direct-path relative transfer function, sound source localization, direction of arrival, deep neural network
Journal
29
Issue
ISSN
Citations 
1
2329-9290
0
PageRank 
References 
Authors
0.34
14
3
Name
Order
Citations
PageRank
Bing Yang1448.37
Hong Liu274782.65
Xiaofei Li310324.78