Abstract | ||
---|---|---|
Assessing semantic similarity between text documents is a crucial aspect in Information Retrieval systems. In this work, we propose to use hyperlink information to derive a similarity measure that can then be applied to compare any text documents, with or without hyperlinks. As linked documents are generally semantically closer than unlinked documents, we use a training corpus with hyperlinks to infer a function a,b → sim(a,b) that assigns a higher value to linked documents than to unlinked ones. Two sets of experiments on different corpora show that this function compares favorably with OKAPI matching on document retrieval tasks. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1145/1099554.1099666 | CIKM |
Keywords | Field | DocType |
semantic similarity,crucial aspect,okapi matching,document retrieval task,text document,information retrieval system,inferring document similarity,higher value,similarity measure,unlinked document,different corpora show,gradient descent,document retrieval,speech,neural network,neural networks,hyperlinks | Semantic similarity,Data mining,Gradient descent,Similarity measure,Information retrieval,Computer science,Hyperlink,Document retrieval,Artificial neural network,Document similarity | Conference |
ISBN | Citations | PageRank |
1-59593-140-6 | 8 | 0.89 |
References | Authors | |
10 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
David Grangier | 1 | 816 | 41.60 |
Samy Bengio | 2 | 7213 | 485.82 |