Abstract | ||
---|---|---|
Geographical location is vital to geospatial applications like local search and event detection. In this paper, we investigate and improve on the task of text-based geolocation prediction of Twitter users. Previous studies on this topic have typically assumed that geographical references (e.g., gazetteer terms, dialectal words) in a text are indicative of its author's location. However, these references are often buried in informal, ungrammatical, and multilingual data, and are therefore non-trivial to identify and exploit. We present an integrated geolocation prediction framework and investigate what factors impact on prediction accuracy. First, we evaluate a range of feature selection methods to obtain \"location indicative words\". We then evaluate the impact of nongeotagged tweets, language, and user-declared metadata on geolocation prediction. In addition, we evaluate the impact of temporal variance on model generalisation, and discuss how users differ in terms of their geolocatability. We achieve state-of-the-art results for the text-based Twitter user geolocation task, and also provide the most extensive exploration of the task to date. Our findings provide valuable insights into the design of robust, practical text-based geolocation prediction systems. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1613/jair.4200 | J. Artif. Intell. Res. (JAIR) |
Field | DocType | Volume |
Geospatial analysis,Metadata,Data mining,Location,Feature selection,Information retrieval,Generalization,Geolocation,Exploit,Local search (optimization),Mathematics | Journal | 49 |
Issue | ISSN | Citations |
1 | 1076-9757 | 106 |
PageRank | References | Authors |
2.91 | 63 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bo Han | 1 | 593 | 29.85 |
Paul Cook | 2 | 117 | 3.50 |
Timothy Baldwin | 3 | 452 | 22.18 |