Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization - Citegraph

Paper Info

Title
Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization

Abstract
Direct-path relative transfer function (DP-RTF) refers to the ratio between the direct-path acoustic transfer functions of two microphone channels. Though DP-RTF fully encodes the sound spatial cues and serves as a reliable localization feature, it is often erroneously estimated in the presence of noise and reverberation. This paper proposes to learn DP-RTF with deep neural networks for robust binaural sound source localization. A DP-RTF learning network is designed to regress the binaural sensor signals to a real-valued representation of DP-RTF. It consists of a branched convolutional neural network module to separately extract the inter-channel magnitude and phase patterns, and a convolutional recurrent neural network module for joint feature learning. To better explore the speech spectra to aid the DP-RTF estimation, a monaural speech enhancement network is used to recover the direct-path spectrograms from the noisy ones. The enhanced spectrograms are stacked onto the noisy spectrograms to act as the input of the DP-RTF learning network. We train one unique DP-RTF learning network using many different binaural arrays to enable the generalization of DP-RTF learning across arrays. This way avoids time-consuming training data collection and network retraining for a new array, which is very useful in practical application. Experimental results on both simulated and real-world data show the effectiveness of the proposed method for direction of arrival (DOA) estimation in the noisy and reverberant environment, and a good generalization ability to unseen binaural arrays.

Year	DOI	Venue
2021	10.1109/TASLP.2021.3120641	IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Keywords	DocType	Volume
Location awareness, Feature extraction, Arrays, Speech enhancement, Spectrogram, Deep learning, Transfer functions, Direct-path relative transfer function, sound source localization, direction of arrival, deep neural network	Journal	29
Issue	ISSN	Citations
1	2329-9290	0
PageRank	References	Authors
0.34	14	3

Authors (3 rows)

Cited by (0 rows)

References (14 rows)

Name	Order	Citations	PageRank
Bing Yang	1	44	8.37
Hong Liu	2	747	82.65
Xiaofei Li	3	103	24.78

1