Abstract | ||
---|---|---|
For the temporal analysis of news articles or the extraction of temporal expressions from such documents, accurate document creation times are indispensable. While document creation times are available as time stamps or HTML metadata in many cases, depending on the document collection in question, this data can be inaccurate or incomplete in others. Especially in digitally published online news articles, publication times are often missing from the article or inaccurate due to (partial) updates of the content at a later time. In this paper, we investigate the prediction of document creation times for articles in citation networks of digitally published news articles, which provide a network structure of knowledge flows between individual articles in addition to the contained temporal expressions. We explore the evolution of such networks to motivate the extraction of suitable features, which we utilize in a subsequent prediction of document creation times, framed as a regression task. Based on our evaluation of several established machine learning regressors on a large network of English news articles, we show that the combination of temporal and local structural features allows for the estimation of document creation times from the network.
|
Year | DOI | Venue |
---|---|---|
2018 | 10.1145/3184558.3191633 | WWW '18: The Web Conference 2018
Lyon
France
April, 2018 |
Field | DocType | ISBN |
Metadata,World Wide Web,Information retrieval,Computer science,Citation,Temporal expressions,Citation network,Network structure | Conference | 978-1-4503-5640-4 |
Citations | PageRank | References |
0 | 0.34 | 11 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Andreas Spitz | 1 | 49 | 9.19 |
Jannik Strötgen | 2 | 492 | 38.20 |
Michael Gertz | 3 | 325 | 27.07 |