Title | ||
---|---|---|
Using unsupervised link discovery methods to find interesting facts and connections in a bibliography dataset |
Abstract | ||
---|---|---|
This paper describes a submission to the Open Task of the 2003 KDD Cup. For this task contestants were asked to devise their own questions about the HEP-Th bibliography dataset, and the most interesting result would be selected as the winner. Instead of taking a more traditional approach such as starting with a inspection of the data, formulating questions or hypotheses interesting to us and then devising an analysis and approach to answer these questions, we tried to go a different route: can we develop a program that automatically finds interesting facts and connections in the data?To do this we developed a set of unsupervised link discovery methods that compute interestingness based on a notion of "rarity" and "abnormality". The experiments performed on the HEP-Th dataset show that our approaches are able to automatically uncover interesting hidden connections (e.g. significant relationships between people) and unexpected facts (e.g. citation loops) without the support of any prerequisite knowledge or training examples. The interestingness of some of our results is self-evident. For others we were able to verify them by looking for supporting evidence on the World-Wide-Web, which shows that our methods can find connections between entities that actually are interestingly connected in the real world in an unsupervised way. |
Year | DOI | Venue |
---|---|---|
2003 | 10.1145/980972.981000 | SIGKDD Explorations |
Keywords | Field | DocType |
citation loop,traditional approach,open task,kdd cup,hep-th dataset show,interesting fact,hep-th bibliography dataset,unsupervised link discovery method,interesting hidden connection,interesting result,hypotheses,questionnaires,rule based systems,world wide web,inspection,patterns,internet | Data science,Data mining,Rule-based system,Computer science,Citation,Bibliography,The Internet | Journal |
Volume | Issue | Citations |
5 | 2 | 11 |
PageRank | References | Authors |
0.97 | 2 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shou-De Lin | 1 | 706 | 84.81 |
Hans Chalupsky | 2 | 358 | 44.48 |