Title
The 2015 sheffield system for longitudinal diarisation of broadcast media
Abstract
Speaker diarisation is the task of answering "who spoke when" within a multi-speaker audio recording. Diarisation of broadcast media typically operates on individual television shows, and is a particularly difficult task, due to a high number of speakers and challenging background conditions. Using prior knowledge, such as that from previous shows in a series, can improve performance. Longitudinal diarisation allows to use knowledge from previous audio files to improve performance, but requires finding matching speakers across consecutive files. This paper describes the University of Sheffield system for participation in the 2015 Multi-Genre Broadcast (MGB) challenge. The challenge required longitudinal diarisation of data from BBC archives, under very constrained resource settings. Our system consists of three main stages: speech activity detection using DNNs with novel adaptation and decoding methods; speaker segmentation and clustering, with adaptation of the DNN-based clustering models; and finally speaker linking to match speakers across shows. The final result on the development set of 19 shows from five different television series provided a Diarisation Error Rate of 50.77% in the diarisation and linking task.
Year
DOI
Venue
2015
10.1109/ASRU.2015.7404855
2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
Keywords
Field
DocType
speaker diarisation,linking,neural networks,adaptation
Broadcasting,Computer science,Voice activity detection,Segmentation,Word error rate,Speech recognition,Speaker recognition,Speaker diarisation,Natural language processing,Artificial intelligence,Cluster analysis,Sound recording and reproduction
Conference
Citations 
PageRank 
References 
4
0.43
15
Authors
6
Name
Order
Citations
PageRank
Rosanna Milner1112.59
Oscar Saz214216.30
Salil Deena3273.61
Mortaza Doulaty4335.35
Raymond W. M. Ng534021.61
Thomas Hain6184.50