A framework for text processing and supporting access to collections of digitized historical newspapers - Citegraph

Paper Info

Title
A framework for text processing and supporting access to collections of digitized historical newspapers

Abstract
Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events.

Year	DOI	Venue
2007	10.1007/978-3-540-73354-6_26	HCI (9)
Keywords	Field	DocType
structural model,ocrd text,large quantity,digitized historical newspaper,summarization technique,text processing,automatic indexing,community content,historical newspaper,lexical semantics,article schema	Metadata,Automatic summarization,Digitization,Information retrieval,Lexical semantics,Visualization,Computer science,Artificial intelligence,Natural language processing,Digital library,Automatic indexing,Text processing	Conference
Volume	ISSN	Citations
4558	0302-9743	8
PageRank	References	Authors
1.03	19	4

Authors (4 rows)

Cited by (8 rows)

References (19 rows)

Name	Order	Citations	PageRank
Robert B. Allen	1	2030	338.48
Andrea Japzon	2	39	3.17
Palakorn Achananuparp	3	302	23.16
Ki Jung Lee	4	11	2.44

1