Title | ||
---|---|---|
An Automatic Approach for Discovering and Geocoding Locations in Domain-Specific Web Data (Application Paper). |
Abstract | ||
---|---|---|
We present an automatic approach for discovering location names in WWW data culled from diverse domains. Our approach builds upon the Apache Tika, Apache OpenNLP, and Apache Lucene frameworks. Tika is used to extract text and metadata from any file. The text and metadata are provided to Apache OpenNLP and its location entity extraction model. The discovered location entities are then delivered to a gazetteer indexed in Apache Lucene derived from the Geonames.org dataset. This paper describes the overall approach and then explains in detail the challenges we faced, and the methodology that we employed to overcome them. We describe the evolution of our geo gazetteer process and algorithm and demonstrate the approachu0027s accuracy in data collected in the DARPA MEMEX and NSF Polar Cyber Infrastructure efforts. |
Year | Venue | Field |
---|---|---|
2016 | IRI | Data mining,Metadata,World Wide Web,Geocoding,Memex,Computer science,Cyber infrastructure,Artificial intelligence,Machine learning |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chris A. Mattmann | 1 | 200 | 25.39 |
Madhav Sharan | 2 | 0 | 0.68 |