Title
Mining a Multilingual Geographical Gazetteer from the Web
Abstract
Geographical gazetteers are necessary in a wide variety of applications. In the past, the construction of such gazetteers has been a tedious, manual process and only recently have the first attempts to automate the gazetteers creation been made. Here we describe our approach for mining accurate but large-scale multilingual geographic information by successively filtering information found in heterogeneous data sources (Flickr, Wikipedia, Panoramio, Web pages indexed by search engines). Statistically cross-checking information found in each site, we are able to identify new geographic objects, and to indicate, for each one, its name, its GPS coordinates, its encompassing regions (city, region, country), the language of the name, its popularity, and the type of the object (church, bridge, etc.). We evaluate our approach by comparing, wherever possible, our multilingual gazetteer to other known attempts at automatically building a geographic database and to Geonames, a manually built gazetteer.
Year
DOI
Venue
2009
10.1109/WI-IAT.2009.16
Web Intelligence
Keywords
Field
DocType
known attempt,geographical gazetteer,statistically cross-checking information,multilingual gazetteer,encompassing region,geographic database,large-scale multilingual geographic information,heterogeneous data source,multilingual geographical gazetteer,new geographic object,gazetteers creation,wikipedia,search engines,indexation,place names,search engine,databases,global positioning system,categorization,web pages
Toponymy,Categorization,Data mining,World Wide Web,Information retrieval,Web page,Computer science,Popularity,Geographic coordinate system,Geographic database,Global Positioning System
Conference
Citations 
PageRank 
References 
11
0.62
8
Authors
3
Name
Order
Citations
PageRank
Adrian Popescu126320.15
Gregory Grefenstette21129147.00
houda bouamor38817.62