Abstract | ||
---|---|---|
As the most widely-used spatial filtering approach for multi-channel speech separation, beamforming extracts the target speech signal arriving from a specific direction. An emerging alternative approach is multi-channel complex spectral mapping, which trains a deep neural network (DNN) to directly estimate the real and imaginary spectrograms of the target speech signal from those of the multi-channel noisy mixture. In this all-neural approach, the trained DNN itself becomes a nonlinear, time-varying spectrospatial filter. However, it remains unclear how this approach performs relative to commonly-used beamforming techniques on different array configurations and acoustic environments. This paper is devoted to examining this issue in a systematic way. Comprehensive evaluations show that multi-channel complex spectral mapping achieves separation performance comparable to or better than beamforming for different array geometries and speech separation tasks and reduces to monaural complex spectral mapping in single-channel conditions, demonstrating the general utility of this approach on multi-channel and single-channel speech separation. In addition, such an approach is computationally more efficient than widely-used mask-based beamforming. We conclude that this neural spectrospatial filter provides a strong alternative to traditional and mask-based beamforming. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/TASLP.2022.3145319 | IEEE/ACM Transactions on Audio, Speech, and Language Processing |
Keywords | DocType | Volume |
Beamforming,deep learning,multi-channel complex spectral mapping,spectrospatial filtering,speech separation | Journal | 30 |
Issue | ISSN | Citations |
1 | 2329-9290 | 1 |
PageRank | References | Authors |
0.37 | 15 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tan Ke | 1 | 40 | 9.22 |
Zhong-Qiu Wang | 2 | 68 | 9.93 |
DeLiang Wang | 3 | 49 | 2.71 |