Title
Robust Math Formula Recognition in Degraded Chinese Document Images
Abstract
In this paper, we study the problem of math formula recognition (MFR) in degraded Chinese document images. Compared to traditional optical character recognition (OCR), the MFR problem brings new challenges in terms of character segmentation and structural analysis, especially in degraded images. To tackle these issues, we propose an over-segmentation strategy to split and recognize adhesive formula elements based on convolutional neural network (CNN). In addition, we propose a hierarchical framework for formula structure analysis that constructs the formula in a top-down manner to iteratively split the regions into recognizable units. Due to the lack of degraded Chinese document images with math formulas in the community, we also harvest a diverse ground-truth dataset containing 100 images submitted from our system users. Extended experiments demonstrate the effectiveness and robustness of our proposed method in comparison with state-of-the-art methods.
Year
DOI
Venue
2017
10.1109/ICDAR.2017.27
2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)
Keywords
Field
DocType
Math Formula Recognition,Chinese Document Image,Convolutional Neural Network
Structure analysis,Pattern recognition,Character recognition,Convolutional neural network,Segmentation,Computer science,Optical character recognition,Image segmentation,Robustness (computer science),Artificial intelligence,Text recognition
Conference
Volume
ISSN
ISBN
01
1520-5363
978-1-5386-3587-2
Citations 
PageRank 
References 
1
0.35
6
Authors
7
Name
Order
Citations
PageRank
Ning Liu18831.20
Dongxiang Zhang274343.89
Xing Xu376462.73
Long Guo4113.67
Lijiang Chen530423.22
Wenju Liu621439.32
Dengfeng Ke7126.51