Abstract | ||
---|---|---|
We propose a novel approach for helping content transcription of handwritten digital documents. The approach adopts a segmentation based keyword retrieval approach that follows query-by-string paradigm and exploits the user validation of the retrieved words to improve its performance during operation. Our approach starts with an initial training set, which contains only a few pages and a tentative list of words supposedly in the document, and iteratively interleaves a word retrieval step by the system with a validation step by the user. After each iteration, the system exploits the results of the validation to update its internal model, so as to use that evidence in further iterations of the search. Experimental results on the Bentham dataset show that the system may start with a few word images and their transcripts, exhibits an improvement of the performance during operation, and after a few iterations is able to correctly transcribe more than 68% of the word of the list. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/ICFHR.2016.0051 | 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR) |
Keywords | Field | DocType |
Historical handwritten documents,human in the loop,word retrieval | Pattern recognition,Segmentation,Computer science,Knowledge-based systems,Handwriting recognition,Image segmentation,Exploit,Artificial intelligence,Human-in-the-loop,Hidden Markov model,Machine learning,Internal model | Conference |
ISSN | ISBN | Citations |
2167-6445 | 978-1-5090-0982-4 | 0 |
PageRank | References | Authors |
0.34 | 5 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Adolfo Santoro | 1 | 7 | 2.72 |
Antonio Parziale | 2 | 25 | 5.66 |
Angelo Marcelli | 3 | 139 | 32.42 |