Spectrogram Transformers for Audio Classification - Citegraph

Paper Info

Title
Spectrogram Transformers for Audio Classification

Abstract
Audio classification is an important task in the machine learning field with a wide range of applications. Since the last decade, deep learning based methods have been widely used and the transformer-based models are becoming new paradigm for audio classification. In this paper, we present Spectrogram Transformers, which are a group of transformer-based models for audio classification. Based on the fundamental semantics of audio spectrogram, we design two mechanisms to extract temporal and frequency features from audio spectrogram, named time-dimension sampling and frequency-dimension sampling. These discriminative representations are then enhanced by various combinations of attention block architectures, including Tempo-ral Only (TO) attention, Temporal-Frequency sequential (TFS) attention, Temporal-Frequency Parallel (TFP) attention, and Two-stream Temporal-Frequency (TSTF) attention, to extract the sound record signatures to serve the classification task. Our experiments demonstrate that these Transformer models outper-form the state-of-the-art methods on ESC-50 dataset without pre-training stage. Furthermore, our method also shows great efficiency compared with other leading methods.

Year	DOI	Venue
2022	10.1109/IST55454.2022.9827729	2022 IEEE International Conference on Imaging Systems and Techniques (IST)
Keywords	DocType	ISSN
Transformer,Spectrogram,Audio representation,Audio classification	Conference	1558-2809
ISBN	Citations	PageRank
978-1-6654-8103-8	0	0.34
References	Authors
10	4

Authors (4 rows)

Cited by (0 rows)

References (10 rows)

Name	Order	Citations	PageRank
Yixiao Zhang	1	0	0.34
Baihua Li	2	176	21.71
Hui Fang	3	0	1.01
Qinggang Meng	4	273	23.54

1