Efficient keyword extraction for meaningful document perception - Citegraph

Paper Info

Title
Efficient keyword extraction for meaningful document perception

Abstract
Keyword extraction is a common technique in the domain of information retrieval. Keywords serve as a minimalistic summary for single documents or document collections, enabling the reader to quickly perceive the main contents of a text. However, they are often not readily available for the documents of interest. Common keyword extraction techniques demand either a large data collection, a learning process, or access to extensive amounts of reference data. By relying on additional linguistic features (e.g. stop word removal), most approaches are language-restricted. Moreover, the extracted keywords usually pertain to the entire document, rather than only to the portion that is of interest to the reader. In this paper, we present an efficient and flexible approach to summarize selections of text within a document. Our solution is based on a keyword extraction algorithm that is applicable to a variety of documents, regardless of language or context. This algorithm relies on the Helmholtz principle and extends a recently presented approach. Our extension covers the features of a weighting algorithm while providing a self-regulation capability to allow for more meaningful results. Furthermore, our approach takes into account the document structure in order to enhance pure statistic summarizations. We evaluate the efficiency of our approach and present results with meaningful examples. In addition, we outline further applications of our approach that allow for enhanced document perception as well as for meaningful document indexing and retrieval.

Year	DOI	Venue
2011	10.1145/2034691.2034732	ACM Symposium on Document Engineering
Keywords	Field	DocType
single document,entire document,efficient keyword extraction,flexible approach,keyword extraction algorithm,keyword extraction,enhanced document perception,document collection,meaningful document indexing,meaningful document perception,common keyword extraction technique,document structure,heuristic algorithm,reference data,information retrieval	Data collection,tf–idf,Information retrieval,Heuristic (computer science),Document clustering,Computer science,Keyword extraction,Document Structure Description,Search engine indexing,Database,Stop words	Conference
Citations	PageRank	References
6	0.48	22
Authors
3

Authors (3 rows)

Cited by (6 rows)

References (22 rows)

Name	Order	Citations	PageRank
Thomas Bohne	1	7	1.18
Sebastian Rönnau	2	78	6.28
Uwe M. Borghoff	3	412	175.51

1