Title
Convolutional vs. Recurrent Neural Networks for Audio Source Separation.
Abstract
Recent work has shown that recurrent neural networks can be trained to separate individual speakers in a sound mixture with high fidelity. Here we explore convolutional neural network models as an alternative and show that they achieve state-of-the-art results with an order of magnitude fewer parameters. We also characterize and compare the robustness and ability of these different approaches to generalize under three different test conditions: longer time sequences, the addition of intermittent noise, and different datasets not seen during training. For the last condition, we create a new dataset, RealTalkLibri, to test source separation in real-world environments. We show that the acoustics of the environment have significant impact on the structure of the waveform and the overall performance of neural network models, with the convolutional model showing superior ability to generalize to new environments. The code for our study is available at this https URL
Year
Venue
Field
2018
arXiv: Sound
High fidelity,Convolutional neural network,Computer science,Waveform,Recurrent neural network,Speech recognition,Robustness (computer science),Artificial neural network,Source separation
DocType
Volume
Citations 
Journal
abs/1803.08629
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Shariq Mobin111.05
Brian Cheung2403.16
Bruno A. Olshausen349366.79