Title
Spectrogram Transformers for Audio Classification
Abstract
Audio classification is an important task in the machine learning field with a wide range of applications. Since the last decade, deep learning based methods have been widely used and the transformer-based models are becoming new paradigm for audio classification. In this paper, we present Spectrogram Transformers, which are a group of transformer-based models for audio classification. Based on the fundamental semantics of audio spectrogram, we design two mechanisms to extract temporal and frequency features from audio spectrogram, named time-dimension sampling and frequency-dimension sampling. These discriminative representations are then enhanced by various combinations of attention block architectures, including Tempo-ral Only (TO) attention, Temporal-Frequency sequential (TFS) attention, Temporal-Frequency Parallel (TFP) attention, and Two-stream Temporal-Frequency (TSTF) attention, to extract the sound record signatures to serve the classification task. Our experiments demonstrate that these Transformer models outper-form the state-of-the-art methods on ESC-50 dataset without pre-training stage. Furthermore, our method also shows great efficiency compared with other leading methods.
Year
DOI
Venue
2022
10.1109/IST55454.2022.9827729
2022 IEEE International Conference on Imaging Systems and Techniques (IST)
Keywords
DocType
ISSN
Transformer,Spectrogram,Audio representation,Audio classification
Conference
1558-2809
ISBN
Citations 
PageRank 
978-1-6654-8103-8
0
0.34
References 
Authors
10
4
Name
Order
Citations
PageRank
Yixiao Zhang100.34
Baihua Li217621.71
Hui Fang301.01
Qinggang Meng427323.54