Abstract | ||
---|---|---|
In this paper, we propose a multi-objective optimization based clustering approach to address the word sense induction problem by leveraging the advantages of document-content and their structures in the Web. Recent works attempt to tackle this problem from the perspective of content analysis framework. However, in this paper, we show that contents and hyperlinks existing in the Web are important and complementary sources of information. Our strategy is based on the adaptation of a simulated annealing algorithm to take into account second-order similarity measures as well as structural information obtained with a pageRank based similarity kernel. Exhaustive results on the benchmark datasets show that our proposed approach attains better accuracy compared to the content based or hyperlink strategy encouraging the combination of these sources. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1007/978-3-319-41754-7_36 | Lecture Notes in Computer Science |
Field | DocType | Volume |
Kernel (linear algebra),Simulated annealing,PageRank,Data mining,Content analysis,Word-sense induction,Computer science,Hyperlink,Cluster analysis | Conference | 9612 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
11 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sudipta Acharya | 1 | 16 | 5.36 |
Asif Ekbal | 2 | 737 | 119.31 |
Sriparna Saha | 3 | 1064 | 106.07 |
Prabhakaran Santhanam | 4 | 0 | 0.34 |
Jose G. Moreno | 5 | 50 | 10.67 |
Gaël Dias | 6 | 354 | 41.95 |