Title
An automatic linking service of document images reducing the effects of OCR errors with latent semantics
Abstract
Robust Information Retrieval (IR) systems have been demanded due to the widespread and multipurpose use of document images, and the high number of document images repositories available nowadays. This paper presents a novel approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). The LinkDI service extracts and indexes document images content, obtains its latent semantics, and defines relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents and among their respective document images. Results show the feasibility of LinkDI relating OCR output with high degradation.
Year
DOI
Venue
2010
10.1145/1774088.1774092
Proceedings of the ACM Symposium on Applied Computing
Keywords
Field
DocType
latent semantic indexing,indexation,optical character recognition,visual analytics,information retrieval
Latent semantic indexing,Information retrieval,Document clustering,Computer science,Visual analytics,Optical character recognition,Hyperlink,Semantics
Conference
Citations 
PageRank 
References 
1
0.35
10
Authors
5