Title
Temporal corpus summarization using submodular word coverage
Abstract
In many areas of life, we now have almost complete electronic archives reaching back for well over two decades. This includes, for example, the body of research papers in computer science, all news articles written in the US, and most people's personal email. However, we have only rather limited methods for analyzing and understanding these collections. While keyword-based retrieval systems allow efficient access to individual documents in archives, we still lack methods for understanding a corpus as a whole. In this paper, we explore methods that provide a temporal summary of such corpora in terms of landmark documents, authors, and topics. In particular, we explicitly model the temporal nature of influence between documents and re-interpret summarization as a coverage problem over words anchored in time. The resulting models provide monotone sub-modular objectives for computing informative and non-redundant summaries over time, which can be efficiently optimized with greedy algorithms. Our empirical study shows the effectiveness of our approach over several baselines.
Year
DOI
Venue
2012
10.1145/2396761.2396857
CIKM
Keywords
Field
DocType
coverage problem,individual document,electronic archives,keyword-based retrieval system,temporal summary,temporal corpus summarization,greedy algorithm,efficient access,computer science,temporal nature,empirical study,submodular word coverage,summarization,submodular,temporal
Data mining,Automatic summarization,Information retrieval,Computer science,Submodular set function,Baseline (configuration management),Greedy algorithm,Artificial intelligence,Natural language processing,Landmark,Monotone polygon,Empirical research
Conference
Citations 
PageRank 
References 
24
0.90
24
Authors
4
Name
Order
Citations
PageRank
Ruben Sipos1774.38
Adith Swaminathan222912.68
Pannaga Shivaswamy31275.15
Thorsten Joachims4173871254.06