Content Based Recommendation and Summarization in the Blogosphere - Citegraph

Paper Info

Title
Content Based Recommendation and Summarization in the Blogosphere

Abstract
This paper presents a stochastic graph based method for rec- ommending or selecting a small subset of blogs that best rep- resents a much larger set. within a certain topic. Each blog is assigned a score that reflects how representative it is. Blog scores are calculated recursively in terms of the scores of their neighbors in a lexical similarity graph. A random walk is performed on a graph where nodes represent blogs and edges link lexically similar blogs. Lexical similarity is measured using either the cosine similarity measure, or the Kullback- Leibler (KL) divergence. In addition, the presented method combines lexical centrality with information novelty to re- duce redundancy in ranked blogs. Blogs similar to highly ranked blogs are discounted to make sure that diversity is maintained in the final rank. The presented method also al- lows to include additional initial quality priors to assess the quality of the blogs, such as frequency of new posts per day and the text fluency measured by n-gram model probabilities, etc. We evaluate our approach using data from two large blog datasets. We measure the selection quality by the number of blogs covered in the network as calculated by an information diffusion model. We compare our method to other heuristic and greedy selection methods and show that it significantly outperforms them.

Year	Venue	Keywords
2009	ICWSM	kullback leibler,random walk
Field	DocType	Citations
Data mining,Lexical similarity,Automatic summarization,Heuristic,Cosine similarity,Ranking,Computer science,Centrality,Blogosphere,Prior probability	Conference	8
PageRank	References	Authors
0.56	20	4

Authors (4 rows)

Cited by (8 rows)

References (20 rows)

Name	Order	Citations	PageRank
Ahmed E. Hassan	1	5959	287.68
Dragomir Radev	2	5167	374.13
Junghoo Cho	3	3088	584.54
Amruta Joshi	4	187	8.67

1