Abstract | ||
---|---|---|
Voice activity detection serves as an essential pre-processor in modern speech processing systems. It classifies audio segments into speech and nonspeech. Many state-of-the-art meth-ods have been proposed to increase the detection accuracy. How-ever, there are still significant limitations to retaining high per-formance while keeping low computation complexity, especially in handling unseen noises. This paper proposes a computation-efficient neural network using a multi-channel audio feature. The audio feature is contextual-aware with positional information and is represented in a three-channel way, similar to RGB pictures, which enables convolutional kernels to capture more information simultaneously. Meanwhile, we introduce channel attention inverted blocks to build a computation-efficient neural network. Our proposed method shows superior performance with extremely few floating point operations as compared with baseline methods. |
Year | Venue | Keywords |
---|---|---|
2022 | 2022 30th European Signal Processing Conference (EUSIPCO) | voice activity detection,channel attention,computation-efficient,deep neural network |
DocType | ISSN | ISBN |
Conference | 2219-5491 | 978-1-6654-6799-5 |
Citations | PageRank | References |
0 | 0.34 | 11 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Runze Wang | 1 | 0 | 0.34 |
Iman Moazzen | 2 | 0 | 0.34 |
Wei-Ping Zhu | 3 | 0 | 1.01 |