Title
Text Segmentation based on Semantic Word Embeddings.
Abstract
We explore the use of semantic word embeddings in text segmentation algorithms, including the C99 segmentation algorithm and new algorithms inspired by the distributed word vector representation. By developing a general framework for discussing a class of segmentation objectives, we study the effectiveness of greedy versus exact optimization approaches and suggest a new iterative refinement technique for improving the performance of greedy strategies. We compare our results to known benchmarks, using known metrics. We demonstrate state-of-the-art performance for an untrained method with our Content Vector Segmentation (CVS) on the Choi test set. Finally, we apply the segmentation procedure to an in-the-wild dataset consisting of text extracted from scholarly articles in the arXiv.org database.
Year
Venue
Field
2015
CoRR
Iterative refinement,Scale-space segmentation,Segmentation,Computer science,Segmentation-based object categorization,Image segmentation,Text segmentation,Natural language processing,Artificial intelligence,Machine learning,Test set
DocType
Volume
Citations 
Journal
abs/1503.05543
4
PageRank 
References 
Authors
0.42
11
2
Name
Order
Citations
PageRank
Alexander A. Alemi1709.92
Paul Ginsparg240.42