Title
Top-k publish-subscribe for social annotation of news
Abstract
Social content, such as Twitter updates, often have the quickest first-hand reports of news events, as well as numerous commentaries that are indicative of public view of such events. As such, social updates provide a good complement to professionally written news articles. In this paper we consider the problem of automatically annotating news stories with social updates (tweets), at a news website serving high volume of pageviews. The high rate of both the pageviews (millions to billions a day) and of the incoming tweets (more than 100 millions a day) make real-time indexing of tweets ineffective, as this requires an index that is both queried and updated extremely frequently. The rate of tweet updates makes caching techniques almost unusable since the cache would become stale very quickly. We propose a novel architecture where each story is treated as a subscription for tweets relevant to the story's content, and new algorithms that efficiently match tweets to stories, proactively maintaining the top-k tweets for each story. Such top-k pub-sub consumes only a small fraction of the resource cost of alternative solutions, and can be applicable to other large scale content-based publish-subscribe problems. We demonstrate the effectiveness of our approach on realworld data: a corpus of news stories from Yahoo! News and a log of Twitter updates.
Year
DOI
Venue
2013
10.14778/2536336.2536340
PVLDB
Keywords
Field
DocType
news story,high volume,social annotation,news website,social content,news event,high rate,news article,top-k publish-subscribe,annotating news story,tweet updates,social updates
Publication,Data mining,World Wide Web,Architecture,Annotation,Computer science,Cache,Search engine indexing,Page view,Database
Journal
Volume
Issue
ISSN
6
6
2150-8097
Citations 
PageRank 
References 
24
0.94
18
Authors
4
Name
Order
Citations
PageRank
Alexander Shraer140822.85
Maxim Gurevich234718.96
Marcus Fontoura3111661.74
Vanja Josifovski42265148.84