Automatic turn segmentation for Movie & TV subtitles - Citegraph

Paper Info

Title
Automatic turn segmentation for Movie & TV subtitles

Abstract
Movie and TV subtitles contain large amounts of conversational material, but lack an explicit turn structure. This paper present a data-driven approach to the segmentation of subtitles into dialogue turns. Training data is first extracted by aligning subtitles with transcripts in order to obtain speaker labels. This data is then used to build a classifier whose task is to determine whether two consecutive sentences are part of the same dialogue turn. The approach relies on linguistic, visual and timing features extracted from the subtitles themselves and does not require access to the audiovisual material - although speaker diarization can be exploited when audio data is available. The approach also exploits alignments with related subtitles in other languages to further improve the classification performance. The classifier achieves an accuracy of 78 % on a held-out test set. A follow-up annotation experiment demonstrates that this task is also difficult for human annotators.

Year	DOI	Venue
2016	10.1109/SLT.2016.7846272	2016 IEEE Spoken Language Technology Workshop (SLT)
Keywords	Field	DocType
automatic turn segmentation,movie subtitles,TV subtitles,conversational material,data-driven approach,subtitles segmentation,dialogue turns,speaker labels,classifier,linguistic feature extraction,visual feature extraction,timing feature extraction,speaker diarization,audio data,classification performance	Pragmatics,Visualization,Computer science,Segmentation,Audiovisual Material,Speech recognition,Feature extraction,Natural language processing,Artificial intelligence,Speaker diarisation,Classifier (linguistics),Test set	Conference
ISSN	ISBN	Citations
2639-5479	978-1-5090-4904-2	0
PageRank	References	Authors
0.34	0	2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Pierre Lison	1	146	12.35
Raveesh Meena	2	35	4.14

1