Abstract | ||
---|---|---|
AbstractRecognizing irregular text from natural scene images is challenging due to the unconstrained appearance of text, such as curvature, orientation, and distortion. Recent recognition networks regard this task as a text sequence labeling problem and most networks capture the sequence only from a single-granularity visual representation, which to some extent limits the performance of recognition. In this article, we propose a hierarchical attention network to capture multi-granularity deep local representations for recognizing irregular scene text. It consists of several hierarchical attention blocks, and each block contains a Local Visual Representation Module (LVRM) and a Decoder Module (DM). Based on the hierarchical attention network, we propose a scene text recognition network. The extensive experiments show that our proposed network achieves the state-of-the-art performance on several benchmark datasets including IIIT-5K, SVT, CUTE, SVT-Perspective, and ICDAR datasets under shorter training time. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1145/3446971 | ACM/IMS Transactions on Data Science |
DocType | Volume | Issue |
Journal | 2 | 2 |
ISSN | Citations | PageRank |
2691-1922 | 0 | 0.34 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hongchao Gao | 1 | 0 | 2.70 |
Yujia Li | 2 | 0 | 0.34 |
Jiao Dai | 3 | 0 | 0.34 |
Xi Wang | 4 | 0 | 1.69 |
Jizhong Han | 5 | 0 | 0.34 |
Ruixuan Li | 6 | 405 | 69.47 |