Abstract | ||
---|---|---|
In this paper, we propose a novel cross-parallel transformer neural network (CPTNN) for end-to-end speech enhancement in the time domain. The new structure is comprised of an encoder, a cross-parallel transformer module (CPTM), a masking module and a decoder. The encoder first maps the input waveform of noisy speech into feature representations. The CPTM consists of four residually connected cross-parallel transformer blocks, each utilizing local and global transformers to simultaneously extract local and global features which are then fused by a cross-attention based transformer to obtain a better contextual feature representation. The masking module generates a mask to multiply with encoder output, producing the masked encoder features which will be finally used for reconstructing the enhanced speech by the decoder. Experiments are undertaken on the benchmark dataset, indicating that our CPTNN achieves a better performance than state-of-the-art methods in terms of most evaluation criteria while maintaining the lowest model parameters. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/IWAENC53105.2022.9914777 | 2022 International Workshop on Acoustic Signal Enhancement (IWAENC) |
Keywords | DocType | ISBN |
Cross-parallel transformer,local and global information,cross-attention,low model complexity,speech enhancement | Conference | 978-1-6654-6868-8 |
Citations | PageRank | References |
0 | 0.34 | 9 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kai Wang | 1 | 0 | 1.01 |
Bengbeng He | 2 | 0 | 1.01 |
Wei-Ping Zhu | 3 | 0 | 1.01 |