Abstract | ||
---|---|---|
State-of-the-art story link detection systems, that is, systems that determine whether two stories are about the same event or linked, are usually based on the cosine-similarity measured between two stories. This paper presents a method for improving the performance of a link detection system by using a variety of similarity measures and using source-pair specific statistical information. The utility of a number of different similarity measures, including cosine, Hellinger, Tanimoto, and clarity, both alone and in combination, was investigated. We also compared several machine learning techniques for combining the different types of information. The techniques investigated were SVMs, voting, and decision trees, each of which makes use of similarity and statistical information differently. Our experimental results indicate that the combination of similarity measures and source-pair specific statistical information using an SVM provides the largest improvement in estimating whether two stories are linked; the resulting system was the best-performing link detection system at TDT-2002. |
Year | Venue | Keywords |
---|---|---|
2004 | HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE | machine learning,decision tree |
Field | DocType | Citations |
Decision tree,CLARITY,Voting,Computer science,Support vector machine,Natural language processing,Artificial intelligence,Machine learning | Conference | 19 |
PageRank | References | Authors |
1.12 | 7 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Francine Chen | 1 | 1218 | 153.96 |
Ayman Farahat | 2 | 244 | 18.07 |
Thorsten Brants | 3 | 1938 | 190.33 |