Abstract | ||
---|---|---|
We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTMs). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables that represent word co-occurrence patterns or co-occurrences of such patterns. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. In comparison with LDA-based methods, a key advantage of the new method is that it represents co-occurrence patterns explicitly using model structures. Extensive empirical results show that the new method significantly outperforms the LDA-based methods in term of model quality and meaningfulness of topics and topic hierarchies. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1016/j.artint.2017.06.004 | Artificial Intelligence |
Keywords | DocType | Volume |
Probabilistic graphical models,Text analysis,Hierarchical latent tree analysis,Hierarchical topic detection | Journal | 250 |
Issue | ISSN | Citations |
1 | 0004-3702 | 7 |
PageRank | References | Authors |
0.46 | 30 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Peixian Chen | 1 | 24 | 1.96 |
Nevin .L Zhang | 2 | 895 | 97.21 |
Tengfei Liu | 3 | 92 | 7.09 |
Leonard K. M. Poon | 4 | 94 | 10.96 |
Zhourong Chen | 5 | 228 | 12.22 |