Abstract | ||
---|---|---|
Generating a coherent synopsis and revealing the development threads for news stories from the increasing amounts of news content remains a formidable challenge. In this paper, we proposed a hddCRP (hybird distant-dependent Chinese Restaurant Process) based HierARChical tOpic model for news Article cLustering, abbreviated as CHARCOAL. Given a bunch of news articles, the outcome of CHARCOAL is threefold: 1) it aggregates relevant new articles into clusters (i.e., stories); 2) it disentangles the chain links (i.e., storyline) between articles in their describing story; 3) it discerns the topics that each story is assigned (e.g., Malaysia Airlines Flight 370 story belongs to the aircraft accident topic and U.S presidential election stories belong to the politics topic). CHARCOAL completes this task by utilizing a hddCRP as prior, and the entities (e.g., names of persons, organizations, or locations) that appear in news articles as clues. Moveover, the adaptation of non-parametric nature in CHARCOAL makes our model can adaptively learn the appropriate number of stories and topics from news corpus. The experimental analysis and results demonstrate both interpretability and superiority of the proposed approach. |
Year | Venue | Field |
---|---|---|
2015 | IJCAI | Interpretability,Chinese restaurant process,Information retrieval,Presidential election,Computer science,Nonparametric statistics,Artificial intelligence,Topic model,Cluster analysis,Politics,Sketch |
DocType | Citations | PageRank |
Conference | 4 | 0.41 |
References | Authors | |
8 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Siliang Tang | 1 | 179 | 33.98 |
Fei Wu | 2 | 2209 | 153.88 |
Si Li | 3 | 4 | 0.75 |
weiming | 4 | 147 | 25.70 |
Zhongfei (Mark) Zhang | 5 | 2451 | 164.30 |
Yue-Ting Zhuang | 6 | 3549 | 216.06 |