Abstract | ||
---|---|---|
We propose a new deep network for audio event recognition, called AENet. In contrast to speech, sounds coming from audio events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of clear subword units that are present in speech. In order to incorporate this long-time frequency structure of audio events, w... |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/TMM.2017.2751969 | IEEE Transactions on Multimedia |
Keywords | DocType | Volume |
Feature extraction,Hidden Markov models,Mel frequency cepstral coefficient,Visualization,Speech,Network architecture | Journal | 20 |
Issue | ISSN | Citations |
3 | 1520-9210 | 18 |
PageRank | References | Authors |
0.91 | 7 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Naoya Takahashi | 1 | 34 | 9.44 |
Michael Gygli | 2 | 232 | 14.18 |
Luc Van Gool | 3 | 27566 | 1819.51 |