Abstract | ||
---|---|---|
In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder-decoder with a multi-view attention block, applied to the time-domain signals. MANNER efficiently extracts three different representations from noisy speech and estimates high-quality clean speech. We evaluated MANNER on the VoiceBank-DEMAND dataset in terms of five objective speech quality metrics. Experimental results show that MANNER achieves state-of-the-art performance while efficiently processing noisy speech. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/ICASSP43922.2022.9747120 | IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hyun Joon Park | 1 | 0 | 0.68 |
Byung Ha Kang | 2 | 0 | 0.34 |
Wooseok Shin | 3 | 0 | 0.68 |
Jin Sob Kim | 4 | 0 | 0.68 |
Sung Won Han | 5 | 0 | 0.68 |