Auditory-inspired sparse representation of audio signals - Citegraph

Paper Info

Title
Auditory-inspired sparse representation of audio signals

Abstract
This article deals with the generation of auditory-inspired spectro-temporal features aimed at audio coding. To do so, we first generate sparse audio representations we call spikegrams, using projections on gammatone/gammachirp kernels that generate neural spikes. Unlike Fourier-based representations, these representations are powerful at identifying auditory events, such as onsets, offsets, transients, and harmonic structures. We show that the introduction of adaptiveness in the selection of gammachirp kernels enhances the compression rate compared to the case where the kernels are non-adaptive. We also integrate a masking model that helps reduce bitrate without loss of perceptible audio quality. We finally propose a method to extract frequent audio objects (patterns) in the aforementioned sparse representations. The extracted frequency-domain patterns (audio objects) help us address spikes (audio events) collectively rather than individually. When audio compression is needed, the different patterns are stored in a small codebook that can be used to efficiently encode audio materials in a lossless way. The approach is applied to different audio signals and results are discussed and compared. This work is a first step towards the design of a high-quality auditory-inspired ''object-based'' audio coder.

Year	DOI	Venue
2011	10.1016/j.specom.2010.09.008	Speech Communication
Keywords	Field	DocType
perceptible audio quality,temporal data mining,audio coder,audio coding,auditory pattern recognition,sparse audio representation,auditory-inspired sparse representation,different audio signal,episode discovery,sparse representations,frequent audio object,audio compression,quantization,encode audio material,audio object,matching pursuit,masking,audio event,frequency domain,pattern recognition,sparse representation	Audio signal,Speech coding,Pattern recognition,Computer science,Audio mining,Sparse approximation,Speech recognition,Sound quality,Artificial intelligence,Audio signal processing,Data compression,Dynamic range compression	Journal
Volume	Issue	ISSN
53	5	Speech Communication
Citations	PageRank	References
11	0.76	16
Authors
4

Authors (4 rows)

Cited by (11 rows)

References (16 rows)

Name	Order	Citations	PageRank
Ramin Pichevar	1	56	9.92
Hossein Najaf-Zadeh	2	22	4.37
Louis Thibault	3	20	3.21
Hassan Lahdili	4	17	2.06

1