Abstract | ||
---|---|---|
Speaker-independent speech separation is a challenging audio processing problem. In recent years, several deep learning algorithms have been proposed to address this problem. The majority of these methods use noncausal implementation which limits their application in real-time scenarios such as in wearable hearing devices and low-latency telecommunication. In this paper, we propose the Online Deep Attractor Network (ODANet), an extension to the Deep Attractor Network (DANet) which is causal and enables real-time speech separation. In contrast with DANet that estimates the global attractor point for each speaker using the entire utterance, ODANet estimates the attractors for each time step and tracks them using a dynamic weighting function with only causal information. This not only solves the speaker tracking problem, but also allows ODANet to generate more stable embeddings across time. Experimental results show that ODANet can achieve a similar separation accuracy as the noncausal DANet in both two speaker and three speaker speech separation problems, which makes it a suitable candidate for applications that require robust real-time speech processing. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ICASSP.2019.8682884 | ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Keywords | Field | DocType |
Real-time systems,Training,Task analysis,Mathematical model,Microsoft Windows,Estimation,Spectrogram | Attractor,Speech processing,Weighting,Pattern recognition,Computer science,Spectrogram,Attractor network,Communication channel,Speech recognition,Artificial intelligence,Deep learning,Audio signal processing | Conference |
ISSN | ISBN | Citations |
1520-6149 | 978-1-4799-8131-1 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Cong Han | 1 | 7 | 4.56 |
Yi Luo | 2 | 120 | 13.05 |
Nima Mesgarani | 3 | 256 | 22.43 |