Title
An automatic approach for efficient text segmentation
Abstract
This paper presents a domain-independent approach for partitioning text documents into a set of topic-coherent segment units, where the structure of segments reflects the patterns of sub-topics of the processed text document. The approach adopts similarity analyses, which is based on Shannon Information Theory, to determine topic distribution among text documents without incorporating thesaurus information and other auxiliary knowledge bases. It first observes the documents in terms of consistency of distribution from the viewpoint of individual word and then constructs a number of segmentation proposals accordingly. Furthermore, it employs the K-means clustering technique to get a consensus from these proposals and finally partition text into a set of topic coherent paragraphs. Through extensive experimental studies based on real and synthetic data sources, the performance analysis illustrates the effectiveness of the approach in text segmentation.
Year
DOI
Venue
2006
10.1007/11892960_51
KES (1)
Keywords
Field
DocType
segmentation proposal,text segmentation,text document,partitioning text document,efficient text segmentation,k-means clustering technique,topic coherent paragraph,partition text,processed text document,automatic approach,topic distribution,domain-independent approach,synthetic data,k means clustering,information theory,knowledge base,k means,shannon information theory
Information theory,k-means clustering,Data mining,Information retrieval,Computer science,Segmentation,Text segmentation,Synthetic data,Knowledge engineering,Case-based reasoning,Cluster analysis,Distributed computing
Conference
Volume
ISSN
ISBN
4251
0302-9743
3-540-46535-9
Citations 
PageRank 
References 
0
0.34
9
Authors
4
Name
Order
Citations
PageRank
Keke Cai124315.36
Jiajun Bu24106211.52
Chun Chen34727246.28
Peng Huang442.77