Title
Sketch the Storyline with CHARCOAL: A Non-Parametric Approach.
Abstract
Generating a coherent synopsis and revealing the development threads for news stories from the increasing amounts of news content remains a formidable challenge. In this paper, we proposed a hddCRP (hybird distant-dependent Chinese Restaurant Process) based HierARChical tOpic model for news Article cLustering, abbreviated as CHARCOAL. Given a bunch of news articles, the outcome of CHARCOAL is threefold: 1) it aggregates relevant new articles into clusters (i.e., stories); 2) it disentangles the chain links (i.e., storyline) between articles in their describing story; 3) it discerns the topics that each story is assigned (e.g., Malaysia Airlines Flight 370 story belongs to the aircraft accident topic and U.S presidential election stories belong to the politics topic). CHARCOAL completes this task by utilizing a hddCRP as prior, and the entities (e.g., names of persons, organizations, or locations) that appear in news articles as clues. Moveover, the adaptation of non-parametric nature in CHARCOAL makes our model can adaptively learn the appropriate number of stories and topics from news corpus. The experimental analysis and results demonstrate both interpretability and superiority of the proposed approach.
Year
Venue
Field
2015
IJCAI
Interpretability,Chinese restaurant process,Information retrieval,Presidential election,Computer science,Nonparametric statistics,Artificial intelligence,Topic model,Cluster analysis,Politics,Sketch
DocType
Citations 
PageRank 
Conference
4
0.41
References 
Authors
8
6
Name
Order
Citations
PageRank
Siliang Tang117933.98
Fei Wu22209153.88
Si Li340.75
weiming414725.70
Zhongfei (Mark) Zhang52451164.30
Yue-Ting Zhuang63549216.06