Title
Analysis of the Data Quality of Audio Descriptions of Environmental Sounds
Abstract
In this paper we perform statistical data analysis of a broad set of state-of-the-art audio features and low-level MPEG-7 audio descriptors. The investigation comprises data analysis to reveal redundancies between state-of-the-art audio features and MPEG-7 audio descriptors. We introduce a novel measure to evaluate the information content of a descriptor in terms of variance. Statistical data analysis reveals the amount of variance contained in a feature. It enables identification of independent and redundant features. This approach assists in efficient selection of orthogonal features for content-based retrieval. We believe that a good feature should provide descriptions with high variance for the underlying data. Combinations of features should consist of decorrelated features in order to increase expressiveness of the descriptions. Although MPEG-7 is a popular and widely used standard for multimedia description, only few investigations do exist that address analysis of the data quality of low-level MPEG-7 descriptions. In the last decades a huge number of features was developed for the analysis of audio content. One of the first application domains of audio analysis was speech recognition (14). With upcoming novel application areas the analysis of music and general purpose environmental sounds gained importance. Different research fields evolved, such as audio segmentation, music information retrieval (MIR), and environmental sound recognition (ESR). Each of these areas developed its specific description techniques (features). Currently, features are often employed in other domains than their original ones. A recent effort to standardize multimedia description tools led to the MPEG-7 standard. MPEG-7 is an ISO/IEC standard for multimedia content description (13). The standard defines low-level descriptions techniques (including audio) as well as high-level tools for multimedia processing. The huge number of existing features makes the selection of the most appropriate feature set for a task difficult. Statistical data analysis can help in the identification of independent features. In this paper, we perform a quantitative analysis of low- level MPEG-7 audio descriptors in the domain of environmental sounds. We compare MPEG-7 descriptors to a set of state- of-the-art audio features we previously analyzed in the domain of environmental sounds (19). We investigate different description techniques by statistical data analysis in order to identify similarities and redundancies. Redundant features describe similar properties of the underlying data, while statistically independent features contain orthogonal information. The objective of feature selection is the combination of orthogonal features in order to maximize the amount of represented information. The method proposed in this paper supports the identification of independent and redundant features. Furthermore, we evaluate selected MPEG-7 high-level tools from this point of view. Additionally, we investigate the amount of information (entropy) contained in each feature. The information content of a feature is proportional to the variance of the feature values for a given dataset. We derive a measure that represents the information contained in a feature with respect to its variance in order to evaluate the expressiveness of a feature.
Year
Venue
Keywords
2007
JDIM
data quality,speech recognition,audio analysis,quantitative analysis,information content,information entropy,data analysis,feature selection,statistical independence
Field
DocType
Volume
Environmental sounds,Data quality,Information retrieval,Audio mining,Computer science,Expressivity
Journal
5
Issue
Citations 
PageRank 
2
5
0.59
References 
Authors
8
3
Name
Order
Citations
PageRank
Dalibor Mitrovic1766.23
Matthias Zeppelzauer218621.35
Horst Eidenberger316721.30