Abstract | ||
---|---|---|
In this paper, we study the problem of math formula recognition (MFR) in degraded Chinese document images. Compared to traditional optical character recognition (OCR), the MFR problem brings new challenges in terms of character segmentation and structural analysis, especially in degraded images. To tackle these issues, we propose an over-segmentation strategy to split and recognize adhesive formula elements based on convolutional neural network (CNN). In addition, we propose a hierarchical framework for formula structure analysis that constructs the formula in a top-down manner to iteratively split the regions into recognizable units. Due to the lack of degraded Chinese document images with math formulas in the community, we also harvest a diverse ground-truth dataset containing 100 images submitted from our system users. Extended experiments demonstrate the effectiveness and robustness of our proposed method in comparison with state-of-the-art methods. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/ICDAR.2017.27 | 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) |
Keywords | Field | DocType |
Math Formula Recognition,Chinese Document Image,Convolutional Neural Network | Structure analysis,Pattern recognition,Character recognition,Convolutional neural network,Segmentation,Computer science,Optical character recognition,Image segmentation,Robustness (computer science),Artificial intelligence,Text recognition | Conference |
Volume | ISSN | ISBN |
01 | 1520-5363 | 978-1-5386-3587-2 |
Citations | PageRank | References |
1 | 0.35 | 6 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ning Liu | 1 | 88 | 31.20 |
Dongxiang Zhang | 2 | 743 | 43.89 |
Xing Xu | 3 | 764 | 62.73 |
Long Guo | 4 | 11 | 3.67 |
Lijiang Chen | 5 | 304 | 23.22 |
Wenju Liu | 6 | 214 | 39.32 |
Dengfeng Ke | 7 | 12 | 6.51 |