Title
Resource-adaptive real-time new event detection
Abstract
In a document streaming environment, online detection of the first documents that mention previously unseen events is an open challenge. For this online new event detection (ONED) task, existing studies usually assume that enough resources are always available and focus entirely on detection accuracy without considering efficiency. Moreover, none of the existing work addresses the issue of providing an effective and friendly user interface. As a result, there is a significant gap between the existing systems and a system that can be used in practice. In this paper, we propose an ONED framework with the following prominent features. First, a combination of indexing and compression methods is used to improve the document processing rate by orders of magnitude without sacrificing much detection accuracy. Second, when resources are tight, a resource-adaptive computation method is used to maximize the benefit that can be gained from the limited resources. Third, when the new event arrival rate is beyond the processing capability of the consumer of the ONED system, new events are further filtered and prioritized before they are presented to the consumer. Fourth, implicit citation relationships are created among all the documents and used to compute the importance of document sources. This importance information can guide the selection of document sources. We implemented a prototype of our framework on top of IBM's Stream Processing Core middleware. We also evaluated the effectiveness of our techniques on the standard TDT5 benchmark. To the best of our knowledge, this is the first implementation of a real application in a large-scale stream processing system.
Year
DOI
Venue
2007
10.1145/1247480.1247536
SIGMOD Conference
Keywords
Field
DocType
existing system,online new event detection,document source,large-scale stream processing system,detection accuracy,resource-adaptive real-time new event,existing work,oned system,oned framework,online detection,document processing rate,user interface,document processing,process capability,middleware,real time,stream processing
Middleware,Data mining,IBM,Computer science,Citation,Document processing,Search engine indexing,User interface,Stream processing,Database,Computation
Conference
Citations 
PageRank 
References 
29
1.65
32
Authors
3
Name
Order
Citations
PageRank
Gang Luo174144.73
Chunqiang Tang2128775.09
Philip S. Yu3306703474.16