Title
Speaker Clustering and Cluster Purification Methods for RT07 and RT09 Evaluation Meeting Data
Abstract
This paper presents a design strategy for the speaker diarization system in the IIR submissions to the 2007 and 2009 NIST Rich Transcription Meeting Recognition Evaluations (RT07 and RT09) for the multiple distant microphone (MDM) condition. The system features two algorithms supporting two important steps in a diarization process. The first step is Initial Segmentation and Clustering (ISC), and the second one is cluster merging and purification. In the ISC step, we propose a histogram quantization and clustering technique based on time delay of arrival (TDOA) features by analyzing the correlation among the signals across multiple distant microphones. In the cluster merging and purification step, we further merge the speaker clusters using a Bayesian information criterion (BIC) to consolidate the clusters to arrive at one-cluster-per-speaker. The two steps work in tandem to form an integral process. We propose a novel Consensus Based Cluster Purification (CBCP) method that involves a technique to remove impure speaker segments in the speaker clusters before speaker modeling in the cluster purification process. The system reports a state-of-the-art performance of speaker diarization for RT07 and RT09 MDM condition with 7.47% and 8.77% Diarization error rates (DERs), respectively, for both overlapping and non-overlapping speech.
Year
DOI
Venue
2012
10.1109/TASL.2011.2159203
IEEE Transactions on Audio, Speech & Language Processing
Keywords
Field
DocType
isc step,one-cluster-per-speaker technique,cluster purification methods,multiple distant microphone,tdoa features,pattern clustering,bayesian information criterion,rt09 evaluation meeting data,expert systems,initial segmentation and clustering,cbcp method,signal correlation,bayes methods,multiple distant microphone condition,bic,meeting audio,histogram quantization technique,rt07 evaluation meeting data,speaker modeling,speaker recognition,speaker cluster,speaker diarization system,consensus based cluster purification method,clustering methods,rt09 evaluation,feature extraction,cluster purification process,nist rich transcription meeting recognition evaluations,diarization error rates,speaker clustering,diarization process,time delay of arrival features,microphone arrays,speaker diarization,important step,cluster merging,time-of-arrival estimation,impure speaker segment,delay estimation,correlation methods,error rate,expert system
Histogram,Bayesian information criterion,Pattern recognition,Computer science,Segmentation,Feature extraction,Speech recognition,Speaker recognition,Artificial intelligence,Speaker diarisation,Quantization (signal processing),Cluster analysis
Journal
Volume
Issue
ISSN
20
2
1558-7916
Citations 
PageRank 
References 
2
0.37
5
Authors
4
Name
Order
Citations
PageRank
Tin Lay Nwe147934.59
Hanwu Sun29814.15
Bin Ma360047.26
Haizhou Li43678334.61