Title
Event Detection from Millions of Tweets Related to the Great East Japan Earthquake Using Feature Selection Technique.
Abstract
Social media offers a wealth of insight into howsignificant events -- such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing -- affect individuals. The scale of available data, however, can be intimidating: duringthe Great East Japan Earthquake, over 8 million tweets weresent each day from Japan alone. Conventional word vector-basedevent-detection techniques for social media that use Latent SemanticAnalysis, Latent Dirichlet Allocation, or graph communitydetection often cannot scale to such a large volume of data dueto their space and time complexity. To alleviate this problem, wepropose an efficient method for event detection by leveraging afast feature selection algorithm called CWC. While we begin withword count vectors of authors and words for each time slot (inour case, every hour), we extract discriminative words from eachslot using CWC, which vastly reduces the number of featuresto track. We then convert these word vectors into a time seriesof vector distances from the initial point. The distance betweeneach time slot and the initial point remains high while an eventis happening, yet declines sharply when the event ends, offeringan accurate portrait of the span of an event. This method makesit possible to detect events from vast datasets. To demonstrateour method's effectiveness, we extract events from a dataset ofover two hundred million tweets sent in the 21 days followingthe Great East Japan Earthquake. With CWC, we can identifyevents from this dataset with great speed and accuracy.
Year
DOI
Venue
2015
10.1109/ICDMW.2015.248
ICDM Workshops
Keywords
Field
DocType
Event detection,Social media analysis,Big data analysis,Feature selection,Great East Japan Earthquake
Data mining,Time series,Latent Dirichlet allocation,Social media,Feature selection,Computer science,Feature extraction,Artificial intelligence,Cluster analysis,Big data,Discriminative model,Machine learning
Conference
Citations 
PageRank 
References 
3
0.38
5
Authors
4
Name
Order
Citations
PageRank
Takako Hashimoto15018.47
Dave Shepard270.83
Tetsuji Kuboyama314029.36
Kilho Shin48910.44