Title
Exploiting temporal characteristics of features for effectively discovering event episodes from news corpora
Abstract
AbstractAn organization performing environmental scanning generally monitors or tracks various events concerning its external environment. One of the major resources for environmental scanning is online news documents, which are readily accessible on news websites or infomediaries. However, the proliferation of the World Wide Web, which increases information sources and improves information circulation, has vastly expanded the amount of information to be scanned. Thus, it is essential to develop an effective event episode discovery mechanism to organize news documents pertaining to an event of interest. In this study, we propose two new metrics, Term Frequency × Inverse Document FrequencyTempoTF×IDFTempo and TF×Enhanced-IDFTempo, and develop a temporal-based event episode discovery TEED technique that uses the proposed metrics for feature selection and document representation. Using a traditional TF×IDF-based hierarchical agglomerative clustering technique as a performance benchmark, our empirical evaluation reveals that the proposed TEED technique outperforms its benchmark, as measured by cluster recall and cluster precision. In addition, the use of TF×Enhanced-IDFTempo significantly improves the effectiveness of event episode discovery when compared with the use of TF×IDFTempo.
Year
DOI
Venue
2014
10.1002/asi.22995
Periodicals
Keywords
Field
DocType
text mining
Hierarchical clustering,Data mining,Text mining,Information retrieval,Feature selection,Computer science,Document representation,Recall
Journal
Volume
Issue
ISSN
65
3
2330-1635
Citations 
PageRank 
References 
6
0.44
12
Authors
5
Name
Order
Citations
PageRank
Chih-ping Wei174374.20
Yen-hsien Lee211816.64
Yu-Sheng Chiang380.86
Chun-Ta Chen4306.33
Christopher C. Yang51590138.09