Title
A framework for text processing and supporting access to collections of digitized historical newspapers
Abstract
Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events.
Year
DOI
Venue
2007
10.1007/978-3-540-73354-6_26
HCI (9)
Keywords
Field
DocType
structural model,ocrd text,large quantity,digitized historical newspaper,summarization technique,text processing,automatic indexing,community content,historical newspaper,lexical semantics,article schema
Metadata,Automatic summarization,Digitization,Information retrieval,Lexical semantics,Visualization,Computer science,Artificial intelligence,Natural language processing,Digital library,Automatic indexing,Text processing
Conference
Volume
ISSN
Citations 
4558
0302-9743
8
PageRank 
References 
Authors
1.03
19
4
Name
Order
Citations
PageRank
Robert B. Allen12030338.48
Andrea Japzon2393.17
Palakorn Achananuparp330223.16
Ki Jung Lee4112.44