Look, Listen, And Learn More: Design Choices For Deep Audio Embeddings - Citegraph

Paper Info

Title
Look, Listen, And Learn More: Design Choices For Deep Audio Embeddings

Abstract
A considerable challenge in applying deep learning to audio classification is the scarcity of labeled data. An increasingly popular solution is to learn deep audio embeddings from large audio collections and use them to train shallow classifiers using small labeled datasets. Look, Listen, and Learn (L-3-Net) is an embedding trained through self-supervised learning of audio-visual correspondence in videos as opposed to other embeddings requiring labeled data. This framework has the potential to produce powerful out-of-the-box embeddings for downstream audio classification tasks, but has a number of unexplained design choices that may impact the embeddings' behavior. In this paper we investigate how L-3-Net design choices impact the performance of downstream audio classifiers trained with these embeddings. We show that audio-informed choices of input representation are important, and that using sufficient data for training the embedding is key. Surprisingly, we find that matching the content for training the embedding to the downstream task is not beneficial. Finally, we show that our best variant of the L-3-Net embedding outperforms both the VGGish and SoundNet embeddings, while having fewer parameters and being trained on less data. Our implementation of the L-3-Net embedding model as well as pre-trained models are made freely available online.

Year	DOI	Venue
2019	10.1109/icassp.2019.8682475	2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords	Field	DocType
Audio classification, machine listening, deep audio embeddings, deep learning, transfer learning	Training set,Data modeling,Embedding,Pattern recognition,Task analysis,Computer science,Spectrogram,Artificial intelligence,Labeled data,Deep learning,Machine learning	Conference
ISSN	Citations	PageRank
1520-6149	1	0.35
References	Authors
0	4

Authors (4 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jason Cramer	1	4	2.15
Ho-Hsiang Wu	2	2	2.05
Justin Salamon	3	632	43.64
Juan Pablo Bello	4	1215	108.94

1