Representation and linking mechanisms for audio in MPEG-7 - Citegraph

Paper Info

Title
Representation and linking mechanisms for audio in MPEG-7

Abstract
This paper proposes a general framework for the description of audio within audiovisual sequences for MPEG-7. These related descriptors and description schemes(2) were initially defined during the first phase of MPEG-7 and then evaluated during the Lancaster Meeting held in February 1999. These proposals are based on the underlying premise that audio content can be expressed by a combination of two synergistic representations, both of which are necessary to represent audio content accurately. The first is a structured or semantic representation of audio such as a sentence, paragraph, score, or class. The second is an unstructured representation of the audio simply represented as a continuous stream of data. Since it is not possible to express all aspects of audio in a structured representation, powerful linking mechanisms are required between these two representations. We propose an audio description scheme as a basic structure and representation for audio based on hierarchical, temporal segments. Such a description scheme is essential for both ease of description and to support content based indexing and retrieval of audio. We also propose a description scheme for the representation of larger structures such as spoken content in audio, where the annotation is generated using automatic speech recognition. Finally, we propose linking mechanisms between structured descriptions and unstructured audio content, as an example facility that would add great power to both of the previously mentioned description frameworks. (C) 2000 Elsevier Science B.V. All rights reserved.

Year	DOI	Venue
2000	10.1016/S0923-5965(00)00025-4	SIGNAL PROCESSING-IMAGE COMMUNICATION
Keywords	Field	DocType
MPEG-7 audio,audio structure descriptions,spoken content,speech recognition transcriptions,linking mechanisms	Audio signal,Computer vision,Knowledge representation and reasoning,Annotation,Computer science,Audio mining,Search engine indexing,Paragraph,Artificial intelligence,Natural language processing,Audio description,Sentence	Journal
Volume	Issue	ISSN
16	1-2	0923-5965
Citations	PageRank	References
4	2.93	4
Authors
5

Authors (5 rows)

Cited by (4 rows)

References (4 rows)

Name	Order	Citations	PageRank
adam t lindsay	1	88	11.29
Savitha Srinivasan	2	556	81.89
jason peter andrew charlesworth	3	6	3.70
Philip N. Garner	4	304	41.04
werner kriechbaum	5	11	4.56

1