Title | ||
---|---|---|
An automatic linking service of document images reducing the effects of OCR errors with latent semantics |
Abstract | ||
---|---|---|
Robust Information Retrieval (IR) systems have been demanded due to the widespread and multipurpose use of document images, and the high number of document images repositories available nowadays. This paper presents a novel approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). The LinkDI service extracts and indexes document images content, obtains its latent semantics, and defines relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents and among their respective document images. Results show the feasibility of LinkDI relating OCR output with high degradation. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1145/1774088.1774092 | Proceedings of the ACM Symposium on Applied Computing |
Keywords | Field | DocType |
latent semantic indexing,indexation,optical character recognition,visual analytics,information retrieval | Latent semantic indexing,Information retrieval,Document clustering,Computer science,Visual analytics,Optical character recognition,Hyperlink,Semantics | Conference |
Citations | PageRank | References |
1 | 0.35 | 10 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Renato Bulcão Neto | 1 | 46 | 7.68 |
José Antonio Camacho Guerrero | 2 | 18 | 2.79 |
Álvaro Barreiro | 3 | 1 | 0.35 |
Javier Parapar | 4 | 188 | 25.91 |
Alessandra A. Macedo | 5 | 21 | 4.52 |