Title
A knowledge-based recognition system for historical Mongolian documents.
Abstract
This paper proposes a knowledge-based system to recognize historical Mongolian documents in which the words exhibit remarkable variation and character overlapping. According to the characteristics of Mongolian word formation, the system combines a holistic scheme and a segmentation-based scheme for word recognition. Several types of words and isolated suffixes that cannot be segmented into glyph-units or do not require segmentation are recognized using the holistic scheme. The remaining words are recognized using the segmentation-based scheme, which is the focus of this paper. We exploit the knowledge of the glyph characteristics to segment words into glyph-units in the segmentation-based scheme. Convolutional neural networks are employed not only for word recognition in the holistic scheme, but also for glyph-unit recognition in the segmentation-based scheme. Based on the analysis of recognition errors in the segmentation-based scheme, the system is enhanced by integrating three strategies into glyph-unit recognition. These strategies involve incorporating baseline information, glyph-unit grouping, and recognizing under-segmented and over-segmented fragments. The proposed system achieves 80.86 % word accuracy on the Mongolian Kanjur test samples.
Year
DOI
Venue
2016
10.1007/s10032-016-0267-1
IJDAR
Keywords
Field
DocType
Historical Mongolian document, Holistic recognition, Segmentation-based recognition, Convolutional neural network, Knowledge-based strategy, Optical character recognition
Glyph,Word formation,Intelligent character recognition,Pattern recognition,Segmentation,Convolutional neural network,Computer science,Word recognition,Optical character recognition,Artificial intelligence,Intelligent word recognition
Journal
Volume
Issue
ISSN
19
3
1433-2825
Citations 
PageRank 
References 
3
0.50
19
Authors
4
Name
Order
Citations
PageRank
Xiangdong Su1184.68
Guanglai Gao27824.57
Hongxi Wei3355.71
Fei Long41613.09