Title
Automatic Alignment of Handwritten Images and Transcripts for Training Handwritten Text Recognition Systems
Abstract
State-of-the-art Handwritten Text Recognition techniques are based on statistical models such as hidden Markov models or recurrent neural networks for optical modeling of characters and N-grams for language modeling. These models are trained using well known, learning techniques: Expectation-Maximization, backpropagation, etc. Therefore, training data is needed to build these models. In the case of the optical models the training data consist of text line images with their corresponding transcripts. When the transcript of a handwritten document is available, putting in correspondence automatically the physical lines in the images with the lines of the transcripts is not an easy task. We present a method for automatically aligning handwritten text images and their respective transcripts. The approach automatically segments the images into lines and then recognizes them. An alignment confidence is obtained using the Levenshtein distance between the recognition results and the transcripts. The most confident lines are then used for training. Experiments carried out using a historical document present encouraging results.
Year
DOI
Venue
2018
10.1109/DAS.2018.41
2018 13th IAPR International Workshop on Document Analysis Systems (DAS)
Keywords
Field
DocType
Automatic Alignment,Text line segmentation,Handwritten text recognition
Data modeling,Pattern recognition,Computer science,Levenshtein distance,Recurrent neural network,Image segmentation,Real-time computing,Artificial intelligence,Backpropagation,Hidden Markov model,Language model,Historical document
Conference
ISBN
Citations 
PageRank 
978-1-5386-3347-2
0
0.34
References 
Authors
14
5