Abstract | ||
---|---|---|
In time series classification, signals are typically mapped into some intermediate representation which is used to construct models. We introduce the joint time-frequency scattering transform, a locally time-shift invariant representation which characterizes the multiscale energy distribution of a signal in time and frequency. It is computed through wavelet convolutions and modulus non-linearities and may therefore be implemented as a deep convolutional neural network whose filters are not learned but calculated from wavelets. We consider the progression from mel-spectrograms to time scattering and joint time-frequency scattering transforms, illustrating the relationship between increased discriminability and refinements of convolutional network architectures. The suitability of the joint time-frequency scattering transform for characterizing time series is demonstrated through applications to chirp signals and audio synthesis experiments. The proposed transform also obtains state-of-the-art results on several audio classification tasks, outperforming time scattering transforms and achieving accuracies comparable to those of fully learned networks. |
Year | Venue | Field |
---|---|---|
2018 | arXiv: Sound | Convolutional neural network,Computer science,Convolution,Network architecture,Algorithm,Speech recognition,Scattering,Invariant (mathematics),Chirp,Time–frequency analysis,Wavelet |
DocType | Volume | Citations |
Journal | abs/1807.08869 | 2 |
PageRank | References | Authors |
0.38 | 17 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Joakim Andén | 1 | 64 | 7.70 |
vincent lostanlen | 2 | 27 | 8.88 |
Stéphane Mallat | 3 | 110 | 9.92 |