Machine-learning based classification of speech and music - Citegraph

Paper Info

Title
Machine-learning based classification of speech and music

Abstract
The need to classify audio into categories such as speech or music is an important aspect of many multimedia document retrieval systems. In this paper, we investigate audio features that have not been previously used in music-speech classification, such as the mean and variance of the discrete wavelet transform, the variance of Mel-frequency cepstral coefficients, the root mean square of a lowpass signal, and the difference of the maximum and minimum zero-crossings. We, then, employ fuzzy C-means clustering to the problem of selecting a viable set of features that enables better classification accuracy. Three different classification frameworks have been studied:Multi-Layer Perceptron (MLP) Neural Networks, radial basis functions (RBF) Neural Networks, and Hidden Markov Model (HMM), and results of each framework have been reported and compared. Our extensive experimentation have identified a subset of features that contributes most to accurate classification, and have shown that MLP networks are the most suitable classification framework for the problem at hand.

Year	DOI	Venue
2006	10.1007/s00530-006-0034-0	Multimedia Systems
Keywords	Field	DocType
radial basis function,discrete wavelet transform,document retrieval,machine learning,root mean square,multi layer perceptron,mel frequency cepstral coefficient,hidden markov model	Mel-frequency cepstrum,Computer science,Discrete wavelet transform,Artificial intelligence,Audio signal processing,Cluster analysis,Artificial neural network,Pattern recognition,Fuzzy logic,Speech recognition,Hidden Markov model,Perceptron,Machine learning	Journal
Volume	Issue	ISSN
12	1	1432-1882
Citations	PageRank	References
18	0.80	25
Authors
2

Authors (2 rows)

Cited by (18 rows)

References (25 rows)

Name	Order	Citations	PageRank
M. Kashif Saeed Khan	1	26	1.35
Wasfi G. Al-Khatib	2	139	11.03

1