Abstract | ||
---|---|---|
Nowadays, large collections of photos are tagged with GPS coordinates. The modelling of such large geo-tagged corpora is an important problem in data mining and information retrieval, and involves the use of geographical information to detect topics with a spatial component. In this paper, we propose a novel geographical topic model which captures dependencies between geographical regions to support the detection of topics with complex, non-Gaussian distributed spatial structures. The model is based on a multi-Dirichlet process (MDP), a novel generalisation of the hierarchical Dirichlet process extended to support multiple base distributions. Our method thus is called the MDP-based geographical topic model (MGTM). We show how to use a MDP to dynamically smooth topic distributions between groups of spatially adjacent documents. In systematic quantitative and qualitative evaluations using independent datasets from prior related work, we show that such a model can exploit the adjacency of regions and leads to a significant improvement in the quality of topics compared to the state of the art in geographical topic modelling. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1145/2556195.2556218 | WSDM |
Keywords | DocType | Citations |
geographical information,smooth topic distribution,geographical region,information retrieval,mdp-based geographical topic model,novel geographical topic model,photo collection,large geo-tagged corpus,geographical topic modelling,non-gaussian geographical topic,large collection,hierarchical dirichlet process,topic models,graphical models | Conference | 19 |
PageRank | References | Authors |
0.62 | 14 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Christoph Carl Kling | 1 | 24 | 2.62 |
Jérôme Kunegis | 2 | 874 | 51.20 |
Sergej Sizov | 3 | 545 | 37.91 |
Steffen Staab | 4 | 6658 | 593.89 |