Abstract | ||
---|---|---|
Twitter offers scholars new ways to understand the dynamics of public opinion and social discussions. However, in order to understand such discussions, it is necessary to identify coherent topics that have been discussed in the tweets. To assess the coherence of topics, several automatic topic coherence metrics have been designed for classical document corpora. However, it is unclear how suitable these metrics are for topic models generated from Twitter datasets. In this paper, we use crowdsourcing to obtain pairwise user preferences of topical coherences and to determine how closely each of the metrics align with human preferences. Moreover, we propose two new automatic coherence metrics that use Twitter as a separate background dataset to measure the coherence of topics. We show that our proposed Pointwise Mutual Information-based metric provides the highest levels of agreement with human preferences of topic coherence over two Twitter datasets. |
Year | Venue | Field |
---|---|---|
2016 | ECIR | Data science,Semantic similarity,Data mining,Pairwise comparison,Latent Dirichlet allocation,Information retrieval,Crowdsourcing,Computer science,Coherence (physics),Topic model,Latent semantic analysis,Pointwise mutual information |
DocType | Citations | PageRank |
Conference | 7 | 0.58 |
References | Authors | |
11 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Anjie Fang | 1 | 35 | 5.93 |
Craig Macdonald | 2 | 2588 | 178.50 |
Iadh Ounis | 3 | 3438 | 234.59 |
Philip Habel | 4 | 34 | 2.88 |