Title
DiSCern: A diversified citation recommendation system for scientific queries
Abstract
Performing literature survey for scholarly activities has become a challenging and time consuming task due to the rapid growth in the number of scientific articles. Thus, automatic recommendation of high quality citations for a given scientific query topic is immensely valuable. The state-of-the-art on the problem of citation recommendation suffers with the following three limitations. First, most of the existing approaches for citation recommendation require input in the form of either the full article or a seed set of citations, or both. Nevertheless, obtaining the recommendation for citations given a set of keywords is extremely useful for many scientific purposes. Second, the existing techniques for citation recommendation aim at suggesting prestigious and well-cited articles. However, we often need recommendation of diversified citations of the given query topic for many scientific purposes; for instance, it helps authors to write survey papers on a topic and it helps scholars to get a broad view of key problems on a topic. Third, one of the problems in the keyword based citation recommendation is that the search results typically would not include the semantically correlated articles if these articles do not use exactly the same keywords. To the best of our knowledge, there is no known citation recommendation system in the literature that addresses the above three limitations simultaneously. In this paper, we propose a novel citation recommendation system called DiSCern to precisely address the above research gap. DiSCern finds relevant and diversified citations in response to a search query, in terms of keyword(s) to describe the query topic, while using only the citation graph and the keywords associated with the articles, and no latent information. We use a novel keyword expansion step, inspired by community finding in social network analysis, in DiSCern to ensure that the semantically correlated articles are also included in the results. Our proposed appr- ach primarily builds on the Vertex Reinforced Random Walk (VRRW) to balance prestige and diversity in the recommended citations. We demonstrate the efficacy of DiSCern empirically on two datasets: a large publication dataset of more than 1.7 million articles in computer science domain and a dataset of more than 29,000 articles in theoretical high-energy physics domain. The experimental results show that our proposed approach is quite efficient and it outperforms the state-of-the-art algorithms in terms of both relevance and diversity.
Year
DOI
Venue
2015
10.1109/ICDE.2015.7113314
Data Engineering
Keywords
Field
DocType
citation analysis,graph theory,query processing,recommender systems,discern,vrrw,citation graph,diversified citation recommendation system,high quality citation automatic recommendation,keyword based citation recommendation,keyword expansion step,scientific articles,scientific query topic,search query,theoretical high-energy physics domain,vertex reinforced random walk,computer science,clustering algorithms,markov processes,mathematical model
Data science,Data mining,Markov process,Computer science,Citation,Cluster analysis,Community finding,Recommender system,Web search query,Information retrieval,Social network analysis,Citation graph,Database
Conference
ISSN
Citations 
PageRank 
1084-4627
6
0.46
References 
Authors
32
4
Name
Order
Citations
PageRank
Tanmoy Chakraborty146676.71
Natwar Modani2718.46
Ramasuri Narayanam321719.37
Seema Nagar460.79