Abstract | ||
---|---|---|
Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real world usage by the lack of distinction between thedifferent types of toponyms, which necessitates new guidelines, a consolidation of metrics and a detailed toponym taxonomy with implications for Named Entity Recognition (NER) and beyond. To address these deficiencies, our manuscript introduces a new framework in three parts. (Part 1) Task Definition: clarified via corpus linguistic analysis proposing a fine-grainedPragmatic Taxonomy of Toponyms. (Part 2) Metrics: discussed and reviewed for a rigorous evaluation including recommendations for NER/Geoparsing practitioners. (Part 3) Evaluation data: shared via a new dataset calledGeoWebNewsto provide test/train examples and enable immediate use of our contributions. In addition to fine-grained Geotagging and Toponym Resolution (Geocoding), this dataset is also suitable for prototyping and evaluating machine learning NLP models. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1007/s10579-019-09475-3 | LANGUAGE RESOURCES AND EVALUATION |
Keywords | DocType | Volume |
Geoparsing,Toponym resolution,Geotagging,Geocoding,Named Entity Recognition,Machine learning,Evaluation framework,Geonames,Toponyms,Natural language understanding,Pragmatics | Journal | 54.0 |
Issue | ISSN | Citations |
3 | 1574-020X | 2 |
PageRank | References | Authors |
0.40 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Milan Gritta | 1 | 2 | 1.75 |
Mohammad Taher Pilehvar | 2 | 376 | 25.70 |
Nigel Collier | 3 | 18 | 5.07 |