Title | ||
---|---|---|
Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain |
Abstract | ||
---|---|---|
Blind source separation exploiting multichannel information has long been a popular topic, and recently proposed methods based on the local Gaussian model have shown promising results despite its high computational cost for the case of many microphone signals. The low updating speed for such a model is mainly due to the inversion of a spatial covariance matrix, for which the complexity increases with the number of microphones,
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$M$</tex-math></inline-formula>
, and is generally of order
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$O(M^3)$</tex-math></inline-formula>
. Several projection-based approaches that attempt to concentrate energy on the diagonal part of the spatial covariance matrix have been introduced to circumvent the matrix inversion, which can reduce the complexity to
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$O(M)$</tex-math></inline-formula>
. In this article, we focus on the fast Fourier transform as a projection method because the energy concentration on the diagonal can be efficiently achieved compared with other projection-based methods. For the case where the diagonalization is imperfect, for example, owing to discontinuities at the edge of a linear array, we also developed a more robust algorithm approximating the tri-diagonal part of the spatial covariance matrix, which requires a complexity of
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$O(M^2)$</tex-math></inline-formula>
for the inversion by applying the Thomas algorithm. To remove the ad-hoc integration of post clustering after the decomposition, we also examine a self-clustering algorithm. Our evaluation shows better results than other previously proposed methods in terms of the separation quality under reverberant conditions as well as higher efficiency than multichannel non-negative matrix factorization. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/TASLP.2019.2948770 | IEEE/ACM Transactions on Audio, Speech, and Language Processing |
Keywords | Field | DocType |
Kernel,Covariance matrices,Computational efficiency,Microphones,Symmetric matrices,Source separation,Approximation algorithms | Covariance function,Pattern recognition,Matrix (mathematics),Computer science,Wavenumber,Algorithm,Artificial intelligence,Non-negative matrix factorization | Journal |
Volume | Issue | ISSN |
28 | 1 | 2329-9290 |
Citations | PageRank | References |
1 | 0.36 | 8 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yuki Mitsufuji | 1 | 36 | 9.50 |
Stefan Uhlich | 2 | 35 | 7.62 |
Norihiro Takamune | 3 | 35 | 10.18 |
Daichi Kitamura | 4 | 142 | 21.21 |
Shoichi Koyama | 5 | 68 | 17.76 |
Saruwatari, H. | 6 | 652 | 90.81 |