Abstract | ||
---|---|---|
Thiswork describessystemsfor detectingsemanticcategories present in news video. The multimedia data was processed in three ways: the audio signal was converted to a sequence of acoustic features, automatic speech recognition provided a word-level transcription, and image features were computed for selected frames of the video signal. Primary acoustic, speech, and vision systems were trained to discriminate in- stances of the categories. Higher-level systems exploited cor- relations among the categories, incorporated sequential con- text, and combined the joint evidence from the three informa- tion sources. We present experimental results from the TREC video retrieval evaluation. |
Year | Venue | DocType |
---|---|---|
2006 | TRECVID | Conference |
Citations | PageRank | References |
6 | 1.28 | 5 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Slav Petrov | 1 | 2405 | 107.56 |
Arlo Faria | 2 | 66 | 7.87 |
Pascal Michaillat | 3 | 6 | 1.61 |
Alexander C. Berg | 4 | 10554 | 630.24 |
Andreas Stolcke | 5 | 6690 | 712.46 |
Dan Klein | 6 | 8083 | 495.21 |
Jitendra Malik | 7 | 39445 | 3782.10 |