Title
Cptnn: Cross-Parallel Transformer Neural Network For Time-Domain Speech Enhancement
Abstract
In this paper, we propose a novel cross-parallel transformer neural network (CPTNN) for end-to-end speech enhancement in the time domain. The new structure is comprised of an encoder, a cross-parallel transformer module (CPTM), a masking module and a decoder. The encoder first maps the input waveform of noisy speech into feature representations. The CPTM consists of four residually connected cross-parallel transformer blocks, each utilizing local and global transformers to simultaneously extract local and global features which are then fused by a cross-attention based transformer to obtain a better contextual feature representation. The masking module generates a mask to multiply with encoder output, producing the masked encoder features which will be finally used for reconstructing the enhanced speech by the decoder. Experiments are undertaken on the benchmark dataset, indicating that our CPTNN achieves a better performance than state-of-the-art methods in terms of most evaluation criteria while maintaining the lowest model parameters.
Year
DOI
Venue
2022
10.1109/IWAENC53105.2022.9914777
2022 International Workshop on Acoustic Signal Enhancement (IWAENC)
Keywords
DocType
ISBN
Cross-parallel transformer,local and global information,cross-attention,low model complexity,speech enhancement
Conference
978-1-6654-6868-8
Citations 
PageRank 
References 
0
0.34
9
Authors
3
Name
Order
Citations
PageRank
Kai Wang101.01
Bengbeng He201.01
Wei-Ping Zhu301.01