Sampling-Frequency-Independent Convolutional Layer and its Application to Audio Source Separation - Citegraph

Paper Info

Title
Sampling-Frequency-Independent Convolutional Layer and its Application to Audio Source Separation

Abstract
Audio source separation is often used for the preprocessing of various tasks, and one of its ultimate goals is to construct a single versatile preprocessor that can handle every variety of audio signal. One of the most important varieties of the discrete-time audio signal is sampling frequency. Since it is usually task-specific, the versatile preprocessor must handle all the sampling frequencies required by the possible downstream tasks. However, conventional models based on deep neural networks (DNNs) are not designed for handling a variety of sampling frequencies. Thus, for unseen sampling frequencies, they may not work appropriately. In this paper, we propose sampling-frequency-independent (SFI) convolutional layers capable of handling various sampling frequencies. The core idea of the proposed layers comes from our finding that a convolutional layer can be viewed as a collection of digital filters and inherently depends on sampling frequency. To overcome this dependency, we propose an SFI structure that features analog filters and generates weights of a convolutional layer from the analog filters. By utilizing time- and frequency-domain analog-to-digital filter conversion techniques, we can adapt the convolutional layer for various sampling frequencies. As an example application, we construct an SFI version of a conventional source separation network. Through music source separation experiments, we show that the proposed layers enable separation networks to consistently work well for unseen sampling frequencies in objective and perceptual separation qualities. We also demonstrate that the proposed method outperforms a conventional method based on signal resampling when the sampling frequencies of input signals are significantly lower than the trained sampling frequency.

Year	DOI	Venue
2022	10.1109/TASLP.2022.3203907	IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Keywords	DocType	Volume
Convolution, Source separation, Finite impulse response filters, Task analysis, Time-frequency analysis, Time-domain analysis, Information filters, Audio source separation, analog-to-digital filter conversion, convolutional layer, deep neural networks	Journal	30
Issue	ISSN	Citations
1	2329-9290	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Koichi Saito	1	0	0.34
Tomohiko Nakamura	2	13	5.02
Kohei Yatabe	3	16	10.36
Saruwatari, H.	4	652	90.81

1