Multimodal speaker clustering in full length movies - Citegraph

Paper Info

Title
Multimodal speaker clustering in full length movies

Abstract
Multimodal clustering/diarization tries to answer the question \"who spoke when\" by using audio and visual information. Diarizationconsists of two steps, at first segmentation of the audio information and detection of the speech segments and then clustering of the speech segments to group the speakers. This task has been mainly studied on audiovisual data from meetings, news broadcasts or talk shows. In this paper, we use visual information to aid speaker clustering and we introduce a new video-based feature, called actor presence that can be used to enhance audio-based speaker clustering. We tested the proposed method in three full length stereoscopic movies, i.e. a scenario much more difficult than the ones used so far, where there is no certainty that speech segments and video appearances of actors will always overlap. The results proved that the visual information can improve the speaker clustering accuracy and hence the diarization process.

Year	DOI	Venue
2017	10.1007/s11042-015-3181-5	Multimedia Tools Appl.
Keywords	Field	DocType
Multimodal, Diarization, Clustering, Movies, Actor presence	Certainty,Computer science,Segmentation,Stereoscopy,Speech recognition,Speaker diarisation,Cluster analysis	Journal
Volume	Issue	ISSN
76	2	1573-7721
Citations	PageRank	References
5	0.44	24
Authors
6

Authors (6 rows)

Cited by (5 rows)

References (24 rows)

Name	Order	Citations	PageRank
Ioannis Kapsouras	1	46	4.03
Anastasios Tefas	2	2055	177.05
Nikolaos Nikolaidis	3	108	10.31
Geoffroy Peeters	4	523	62.99
l benaroya	5	5	0.44
Ioannis Pitas	6	6478	626.09

1