Title
Representation and linking mechanisms for audio in MPEG-7
Abstract
This paper proposes a general framework for the description of audio within audiovisual sequences for MPEG-7. These related descriptors and description schemes(2) were initially defined during the first phase of MPEG-7 and then evaluated during the Lancaster Meeting held in February 1999. These proposals are based on the underlying premise that audio content can be expressed by a combination of two synergistic representations, both of which are necessary to represent audio content accurately. The first is a structured or semantic representation of audio such as a sentence, paragraph, score, or class. The second is an unstructured representation of the audio simply represented as a continuous stream of data. Since it is not possible to express all aspects of audio in a structured representation, powerful linking mechanisms are required between these two representations. We propose an audio description scheme as a basic structure and representation for audio based on hierarchical, temporal segments. Such a description scheme is essential for both ease of description and to support content based indexing and retrieval of audio. We also propose a description scheme for the representation of larger structures such as spoken content in audio, where the annotation is generated using automatic speech recognition. Finally, we propose linking mechanisms between structured descriptions and unstructured audio content, as an example facility that would add great power to both of the previously mentioned description frameworks. (C) 2000 Elsevier Science B.V. All rights reserved.
Year
DOI
Venue
2000
10.1016/S0923-5965(00)00025-4
SIGNAL PROCESSING-IMAGE COMMUNICATION
Keywords
Field
DocType
MPEG-7 audio,audio structure descriptions,spoken content,speech recognition transcriptions,linking mechanisms
Audio signal,Computer vision,Knowledge representation and reasoning,Annotation,Computer science,Audio mining,Search engine indexing,Paragraph,Artificial intelligence,Natural language processing,Audio description,Sentence
Journal
Volume
Issue
ISSN
16
1-2
0923-5965
Citations 
PageRank 
References 
4
2.93
4
Authors
5
Name
Order
Citations
PageRank
adam t lindsay18811.29
Savitha Srinivasan255681.89
jason peter andrew charlesworth363.70
Philip N. Garner430441.04
werner kriechbaum5114.56