Title
Boosting Explicit Semantic Analysis by Clustering Paragraph Vectors of Wikipedia Articles.
Abstract
Explicit Semantic Analysis (ESA) is an effective method that utilizes Wikipedia entries (articles) to represent text and compute semantic relatedness (SR) for text pairs. Analogous to ordinary web search techniques, ESA also suffers from the redundancy issues due to the ongoing expansion of the amount of Wikipedia entries. Entries redundancy could lead to biased representation that lay particular emphasis on semantics from a large number of similar entries. On the other hand, original ESA for SR has a weak point that it does not consider the correlations or similarities between the Wikipedia articles of the text representations. To tackle these problems, We develop a novel method to cluster the redundant or similar entries by similarity measurement based on Paragraph Vector (PV), a neural network language model. Results of experiments on four datasets show that our framework could gain better performance in relatedness accuracy against ESA.
Year
DOI
Venue
2015
10.1007/978-3-319-25255-1_53
APWeb
Keywords
Field
DocType
Semantic Relatedness,Explicit Semantic Analysis,Paragraph Vector,Clustering
Data mining,Computer science,Explicit semantic analysis,Redundancy (engineering),Paragraph,Natural language processing,Artificial intelligence,Cluster analysis,Language model,Semantic similarity,Information retrieval,Boosting (machine learning),Semantics,Database
Conference
Volume
ISSN
Citations 
9313
0302-9743
0
PageRank 
References 
Authors
0.34
13
2
Name
Order
Citations
PageRank
Zheng Hai-Tao114224.39
Wu Wenzhen210.70