A prototype document image analysis system for technical journals - Citegraph

Paper Info

Title
A prototype document image analysis system for technical journals

Abstract
Gobbledoc, a system providing remote access to stored documents, which is based on syntactic document analysis and optical character recognition (OCR), is discussed. In Gobbledoc, image processing, document analysis, and OCR operations take place in batch mode when the documents are acquired. The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described. The process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools is also described. Syntactic analysis is used in Gobbledoc to divide each page into labeled rectangular blocks. Blocks labeled text are converted by OCR to obtain a secondary (ASCII) document representation. Since such symbolic files are better suited for computerized search than for human access to the document content and because too many visual layout clues are lost in the OCR process (including some special characters), Gobbledoc preserves the original block images for human browsing. Storage, networking, and display issues specific to document images are also discussed.<>

Year	DOI	Venue
1992	10.1109/2.144436	Document image analysis
Keywords	Field	DocType
computerised picture processing,data structures,document image processing,optical character recognition,ASCII document representation,Gobbledoc,X-Y tree data structure,compiler tools,knowledge base,optical character recognition,prototype document image analysis system,syntactic document analysis,technical journals	World Wide Web,Information retrieval,Document management system,Computer science,Design Document Listing	Journal
Volume	Issue	ISSN
25	7	0018-9162
ISBN	Citations	PageRank
0-8186-6547-5	188	31.54
References	Authors
1	3

Search Limit

100188

Authors (3 rows)

Cited by (100 rows)

References (1 rows)

Name	Order	Citations	PageRank
George Nagy	1	913	105.94
Sharad C. Seth	2	671	93.61
Mahesh Viswanathan	3	2264	206.47

1