One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement - Citegraph

Paper Info

Title
One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement

Abstract
With the recent surge of video conferencing tools usage, providing high-quality speech signals and accurate captions have become essential to conduct day-to-day business or connect with friends and families. Single-channel personalized speech enhancement (PSE) methods show promising results compared with the unconditional speech enhancement (SE) methods in these scenarios due to their ability to remove interfering speech in addition to the environmental noise. In this work, we leverage spatial information afforded by microphone arrays to improve such systems’ performance further. We investigate the relative importance of speaker embeddings and spatial features. Moreover, we propose a new causal array-geometry-agnostic multi-channel PSE model, which can generate a high-quality enhanced signal from arbitrary microphone geometry. Experimental results show that the proposed geometry agnostic model outperforms the model trained on a specific microphone array geometry in both speech quality and automatic speech recognition accuracy. We also demonstrate the effectiveness of the proposed approach for unseen array geometries.

Year	DOI	Venue
2022	10.1109/ICASSP43922.2022.9747395	ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Keywords	DocType	ISSN
multi-channel speech enhancement,target speech extraction,spatial features,microphone array	Conference	1520-6149
ISBN	Citations	PageRank
978-1-6654-0541-6	0	0.34
References	Authors
15	6

Authors (6 rows)

Cited by (0 rows)

References (15 rows)

Name	Order	Citations	PageRank
Hassan Taherian	1	0	0.34
Eskimez, S.E.	2	15	5.34
Takuya Yoshioka	3	585	49.20
Huaming Wang	4	13	2.35
Zhuo Chen	5	153	24.33
Xuedong Huang	6	1390	283.19

1