Speech and crosstalk detection in multichannel audio - Citegraph

Paper Info

Title
Speech and crosstalk detection in multichannel audio

Abstract
The analysis of scenarios in which a number of mi- crophones record the activity of speakers, such as in a round-table meeting, presents a number of computational challenges. For ex- ample, if each participant wears a microphone, speech from both the microphone's wearer (local speech) and from other partici- pants (crosstalk) is received. The recorded audio can be broadly classified in four ways: local speech, crosstalk plus local speech, crosstalk alone and silence. We describe two experiments related to the automatic classification of audio into these four classes. The first experiment attempted to optimize a set of acoustic features for use with a Gaussian mixture model (GMM) classifier. A large set of potential acoustic features were considered, some of which have been employed in previous studies. The best-performing features were found to be kurtosis, "fundamentalness," and cross-correla- tion metrics. The second experiment used these features to train an ergodic hidden Markov model classifier. Tests performed on a large corpus of recorded meetings show classification accuracies of up to 96%, and automatic speech recognition performance close to that obtained using ground truth segmentation. Index Terms—Crosstalk, Cochannel interference, meetings, fea- ture extraction, hidden Markov models (HMM), speech recogni- tion.

Year	DOI	Venue
2005	10.1109/TSA.2004.838531	IEEE Transactions on Speech and Audio Processing
Keywords	Field	DocType
acoustic signal detection,crosstalk,hidden Markov models,microphones,pattern classification,speech recognition,Gaussian mixture model classifier,automatic audio classification,automatic speech recognition,cross-correlation metric,crosstalk detection,ergodic hidden Markov model classifier,kurtosis metric,local speech,microphone,multichannel audio,speech detection	Speech processing,Pattern recognition,Computer science,Audio mining,Markov model,Voice activity detection,Speech recognition,Artificial intelligence,Hidden Markov model,Microphone,Mixture model,Acoustic model	Journal
Volume	Issue	ISSN
13	1	1063-6676
Citations	PageRank	References
52	3.44	12
Authors
4

Authors (4 rows)

Cited by (52 rows)

References (12 rows)

Name	Order	Citations	PageRank
Stuart N. Wrigley	1	181	20.56
Guy J. Brown	2	760	97.54
Vincent Wan	3	373	35.85
Steve Renals	4	2570	293.02

1