Title
Pretext Tasks Selection for Multitask Self-Supervised Audio Representation Learning
Abstract
Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In audio and speech signal processing, a wide range of features were engineered through decades of research efforts. As it turns out, learning to predict such features has proven to be a particularly relevant pretext task, leading to useful self-supervised representations which prove to be effective for downstream tasks. However, methods and common practices for combining such pretext tasks for better performance on the downstream task have not been explored and understood properly. In fact, the process relies almost exclusively on a computationally heavy experimental procedure, which becomes intractable with the increase of the number of pretext tasks. This paper introduces a method to select a group of pretext tasks among a set of candidates. The method we propose estimates calibrated weights for the partial losses corresponding to the considered pretext tasks during the self-supervised training process. The experiments conducted on automatic speech recognition, speaker and emotion recognition and instrument classification validate our approach as the groups selected and weighted with our method perform better than classic baselines, thus facilitating the selection and combination of relevant pretext-task labels for self-supervised representation learning.
Year
DOI
Venue
2022
10.1109/JSTSP.2022.3195430
IEEE Journal of Selected Topics in Signal Processing
Keywords
DocType
Volume
Self-Supervised learning,conditional independence,audio representation learning
Journal
16
Issue
ISSN
Citations 
6
1932-4553
0
PageRank 
References 
Authors
0.34
29
4
Name
Order
Citations
PageRank
Salah Zaiem100.68
Titouan Parcollet2169.23
Slim Essid321232.00
Abdel Heba400.34