Title
Multiple Similarity Measures And Source-Pair Information In Story Link Detection
Abstract
State-of-the-art story link detection systems, that is, systems that determine whether two stories are about the same event or linked, are usually based on the cosine-similarity measured between two stories. This paper presents a method for improving the performance of a link detection system by using a variety of similarity measures and using source-pair specific statistical information. The utility of a number of different similarity measures, including cosine, Hellinger, Tanimoto, and clarity, both alone and in combination, was investigated. We also compared several machine learning techniques for combining the different types of information. The techniques investigated were SVMs, voting, and decision trees, each of which makes use of similarity and statistical information differently. Our experimental results indicate that the combination of similarity measures and source-pair specific statistical information using an SVM provides the largest improvement in estimating whether two stories are linked; the resulting system was the best-performing link detection system at TDT-2002.
Year
Venue
Keywords
2004
HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE
machine learning,decision tree
Field
DocType
Citations 
Decision tree,CLARITY,Voting,Computer science,Support vector machine,Natural language processing,Artificial intelligence,Machine learning
Conference
19
PageRank 
References 
Authors
1.12
7
3
Name
Order
Citations
PageRank
Francine Chen11218153.96
Ayman Farahat224418.07
Thorsten Brants31938190.33