Boosting Explicit Semantic Analysis by Clustering Paragraph Vectors of Wikipedia Articles. - Citegraph

Paper Info

Title
Boosting Explicit Semantic Analysis by Clustering Paragraph Vectors of Wikipedia Articles.

Abstract
Explicit Semantic Analysis (ESA) is an effective method that utilizes Wikipedia entries (articles) to represent text and compute semantic relatedness (SR) for text pairs. Analogous to ordinary web search techniques, ESA also suffers from the redundancy issues due to the ongoing expansion of the amount of Wikipedia entries. Entries redundancy could lead to biased representation that lay particular emphasis on semantics from a large number of similar entries. On the other hand, original ESA for SR has a weak point that it does not consider the correlations or similarities between the Wikipedia articles of the text representations. To tackle these problems, We develop a novel method to cluster the redundant or similar entries by similarity measurement based on Paragraph Vector (PV), a neural network language model. Results of experiments on four datasets show that our framework could gain better performance in relatedness accuracy against ESA.

Year	DOI	Venue
2015	10.1007/978-3-319-25255-1_53	APWeb
Keywords	Field	DocType
Semantic Relatedness,Explicit Semantic Analysis,Paragraph Vector,Clustering	Data mining,Computer science,Explicit semantic analysis,Redundancy (engineering),Paragraph,Natural language processing,Artificial intelligence,Cluster analysis,Language model,Semantic similarity,Information retrieval,Boosting (machine learning),Semantics,Database	Conference
Volume	ISSN	Citations
9313	0302-9743	0
PageRank	References	Authors
0.34	13	2

Authors (2 rows)

Cited by (0 rows)

References (13 rows)

Name	Order	Citations	PageRank
Zheng Hai-Tao	1	142	24.39
Wu Wenzhen	2	1	0.70

1