Title
Text segmentation: A topic modeling perspective
Abstract
In this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of two unsupervised topic models, latent Dirichlet allocation (LDA) and multinomial mixture (MM), to segment a text into semantically coherent parts. The proposed topic model based approaches consistently outperform a standard baseline method on several datasets. A major benefit of the proposed LDA based approach is that along with the segment boundaries, it outputs the topic distribution associated with each segment. This information is of potential use in applications such as segment retrieval and discourse analysis. However, the proposed approaches, especially the LDA based method, have high computational requirements. Based on an analysis of the dynamic programming (DP) algorithm typically used for segmentation, we suggest a modification to DP that dramatically speeds up the process with no loss in performance. The proposed modification to the DP algorithm is not specific to the topic models only; it is applicable to all the algorithms that use DP for the task of text segmentation.
Year
DOI
Venue
2011
10.1016/j.ipm.2010.11.008
Inf. Process. Manage.
Keywords
Field
DocType
proposed lda,topic modeling perspective,latent dirichlet allocation,text segmentation,semantic information,dynamic programming,unsupervised topic model,proposed modification,proposed topic model,topic model,dp algorithm,topic modeling,topic distribution,discourse analysis
Dynamic topic model,Latent Dirichlet allocation,Computer science,Multinomial distribution,Artificial intelligence,Dynamic programming,Information retrieval,Pattern recognition,Segmentation,Semantic information,Text segmentation,Topic model,Machine learning
Journal
Volume
Issue
ISSN
47
4
Information Processing and Management
Citations 
PageRank 
References 
22
0.81
25
Authors
4
Name
Order
Citations
PageRank
Hemant Misra117514.91
François Yvon2941102.51
O. Cappe32112207.95
Joemon M. Jose42782198.37