Title
Automatic multimedia indexing: combining audio, speech, and visual information to index broadcast news
Abstract
This paper describes an indexing system that automatically creates metadata for multimedia broadcast news content by integrating audio, speech, and visual information. The automatic multimedia content indexing system includes acoustic segmentation (AS), automatic speech recognition (ASR), topic segmentation (TS), and video indexing features. The new spectral-based features and smoothing method in the AS module improved the speech detection performance from the audio stream of the input news content. In the speech recognition module, automatic selection of acoustic models achieved both a low WER, as with parallel recognition using multiple acoustic models, and fast recognition, as with the single acoustic model. The TS method using word concept vectors achieved more accurate results than the conventional method using local word frequency vectors. The information integration module provides the functionality of integrating results from the AS module, TS module, and SC module. The story boundary detection accuracy was improved by combining it with the AS results and the SC results compared to the sole TS results
Year
DOI
Venue
2006
10.1109/MSP.2006.1621450
Signal Processing Magazine, IEEE
Keywords
DocType
Volume
acoustic signal processing,audio signal processing,database indexing,multimedia databases,speech recognition,acoustic segmentation,audio information,automatic multimedia content indexing system,automatic speech recognition,boundary detection accuracy,information integration module,multimedia broadcast news metadata,spectral-based features,speech detection performance,speech information,topic segmentation,video indexing features,visual information,word frequency vectors
Journal
23
Issue
ISSN
Citations 
2
1053-5888
8
PageRank 
References 
Authors
0.49
9
4
Name
Order
Citations
PageRank
Ohtsuki, K.1120.94
Bessho, K.280.49
Matsuo, Y.3131.72
Matsunaga, S.447749.70