Title
Learning to Extract Local Events from the Web
Abstract
The goal of this work is extraction and retrieval of local events from web pages. Examples of local events include small venue concerts, theater performances, garage sales, movie screenings, etc. We collect these events in the form of retrievable calendar entries that include structured information about event name, date, time and location. Between existing information extraction techniques and the availability of information on social media and semantic web technologies, there are numerous ways to collect commercial, high-profile events. However, most extraction techniques require domain-level supervision, which is not attainable at web scale. Similarly, while the adoption of the semantic web has grown, there will always be organizations without the resources or the expertise to add machine-readable annotations to their pages. Therefore, our approach bootstraps these explicit annotations to massively scale up local event extraction. We propose a novel event extraction model that uses distant supervision to assign scores to individual event fields (event name, date, time and location) and a structural algorithm to optimally group these fields into event records. Our model integrates information from both the entire source document and its relevant sub-regions, and is highly scalable. We evaluate our extraction model on all 700 million documents in a large publicly available web corpus, ClueWeb12. Using the 217,000 unique explicitly annotated events as distant supervision, we are able to double recall with 85% precision and quadruple it with 65% precision, with no additional human supervision. We also show that our model can be bootstrapped for a fully supervised approach, which can further improve the precision by 30%. In addition, we evaluate the geographic coverage of the extracted events. We find that there is a significant increase in the geo-diversity of extracted events compared to existing explicit annotations, while maintaining high precision levels.
Year
DOI
Venue
2015
10.1145/2766462.2767739
International Conference on Research an Development in Information Retrieval
Keywords
Field
DocType
Information Retrieval,Information Extraction
Data mining,World Wide Web,Social media,Web page,Information retrieval,Computer science,Bootstrapping,Semantic Web,Information extraction,Recall,Scalability
Conference
Citations 
PageRank 
References 
14
0.75
28
Authors
3
Name
Order
Citations
PageRank
John Foley1578.90
Michael Bendersky298648.69
Vanja Josifovski32265148.84