Title
Deep Learning for Classification and as Tapped-Feature Generator in Medieval Word-Image Recognition.
Abstract
Historical manuscripts are the main source of information about past. In recent years, digitization of large quantities of historical handwritten documents is in vogue. This trend gives access to a plethora of information about our medieval past. Such digital archives can be more useful if automatic indexing and retrieval of document images can be provided to the end users of a digital library. An automatic transcription of the full digital archive using traditional Optical Character Recognition (OCR) is still not possible with sufficient accuracy. If full transcription is not available, the end users are interested in indexing and retrieving of particular document pages of their interest. Hence recognition of certain keywords from within the corpus will be sufficient to meet the end users needs. Recently, deep-learning based methods have shown competence in image classification problems. However, one bottleneck with deep-learning based techniques is that it requires a huge amount of training samples per class. Since the number of samples per word class is scarce for collections that are freshly scanned, this is a serious hindrance for direct usage of the deep-learning technique for the purpose of word image recognition in historical document images. This paper aims to investigate the problem of recognizing words from historical document images using a deep-learning based framework for feature extraction and classification while countering the problem of the low amount of image samples using off-line data augmentation techniques. Encouraging results (highest accuracy of 90.03%) were obtained while dealing with 365 different word classes.
Year
Venue
Field
2018
DAS
Computer vision,Digitization,Computer science,Optical character recognition,Search engine indexing,Feature extraction,Artificial intelligence,Digital library,Contextual image classification,Automatic indexing,Historical document
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Sukalpa Chanda19311.20
Emmanuel Okafor200.34
Sebastien Hamel321.42
Dominique Stutzmann402.03
Lambert Schomaker Member5130987.50