Abstract | ||
---|---|---|
We present a hybrid framework that leverages the trade-off between temporal and frequency precision in audio representations to improve the performance of speech enhancement task. We first show that conventional approaches using specific representations such as raw-audio and spectrograms are each effective at targeting different types of noise. By integrating both approaches, our model can learn multi-scale and multi-domain features, effectively removing noise existing on different regions on the time-frequency space in a complementary way. Experimental results show that the proposed hybrid model yields better performance and robustness than using each model individually. |
Year | Venue | DocType |
---|---|---|
2018 | arXiv: Audio and Speech Processing | Journal |
Volume | Citations | PageRank |
abs/1812.08914 | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jang Hyun Kim | 1 | 3 | 3.14 |
Jae Jun Yoo | 2 | 157 | 9.48 |
Sanghyuk Chun | 3 | 19 | 4.64 |
Adrian Kim | 4 | 4 | 2.78 |
Jung-Woo Ha | 5 | 216 | 25.36 |