Title
Georeferencing Wikipedia Documents Using Data from Social Media Sources
Abstract
Social media sources such as Flickr and Twitter continuously generate large amounts of textual information (tags on Flickr and short messages on Twitter). This textual information is increasingly linked to geographical coordinates, which makes it possible to learn how people refer to places by identifying correlations between the occurrence of terms and the locations of the corresponding social media objects. Recent work has focused on how this potentially rich source of geographic information can be used to estimate geographic coordinates for previously unseen Flickr photos or Twitter messages. In this article, we extend this work by analysing to what extent probabilistic language models trained on Flickr and Twitter can be used to assign coordinates to Wikipedia articles. Our results show that exploiting these language models substantially outperforms both (i) classical gazetteer-based methods (in particular, using Yahoo! Placemaker and Geonames) and (ii) language modelling approaches trained on Wikipedia alone. This supports the hypothesis that social media are important sources of geographic information, which are valuable beyond the scope of individual applications.
Year
DOI
Venue
2014
10.1145/2629685
ACM Trans. Inf. Syst.
Keywords
Field
DocType
experimentation,language models,measurement,information search and retrieval,geographic information retrieval,learning,semistructured data
Data mining,World Wide Web,Social media,Information retrieval,Computer science,Textual information,Geographic coordinate system,Georeference,Geographic information retrieval,Probabilistic logic,Language modelling,Language model
Journal
Volume
Issue
ISSN
32
3
1046-8188
Citations 
PageRank 
References 
7
0.47
28
Authors
5
Name
Order
Citations
PageRank
Olivier Van Laere116412.75
Steven Schockaert258357.95
Vlad Tanasescu317212.49
Bart Dhoedt41286132.92
Christopher B. Jones5106795.29