Title
MMToC: A Multimodal Method for Table of Content Creation in Educational Videos
Abstract
In this paper we propose a multimodal method called MMToC for automatically creating a table of content for educational videos. MMToC defines and quantifies word saliency for visual words extracted from the slides and spoken words obtained from the speech transcript. The saliency scores from these two modalities are combined to obtain a ranked list of salient words. These ranked words along with their saliency scores are used to formulate a topic segmentation cost function. The cost function is optimized using a dynamic program framework to obtain the topic segments of the video. These segments are labelled with their corresponding topic names for creating the table of content. We perform experiments on 24 hours of lectures spread across 23 videos ranging over 20-75 minutes duration each. We compare the proposed method with LDA-based video segmentation approaches and show that the proposed MMToC method is significantly better (F-score improvement of 0.19 and 0.24 on two datasets). We also perform a user study to demonstrate the effectiveness of MMToC for navigating educational videos.
Year
DOI
Venue
2015
10.1145/2733373.2806253
ACM Multimedia
Keywords
Field
DocType
Multimodal,table of content,educational videos,visual saliency,text saliency,temporal segmentation,dynamic program
Modalities,Salience (neuroscience),Computer science,Table of contents,Ranging,Artificial intelligence,Computer vision,Ranking,Segmentation,Speech recognition,Multimedia,Salient,Visual Word
Conference
Citations 
PageRank 
References 
8
0.62
25
Authors
3
Name
Order
Citations
PageRank
Arijit Biswas174738.43
Ankit Gandhi2162.92
Om Deshmukh35610.55