Scene Text Recognition with Cascade Attention Network - Citegraph

Paper Info

Title
Scene Text Recognition with Cascade Attention Network

Abstract
ABSTRACTScene text recognition (STR) has experienced increasing popularity both in academia and in industry. Regarding STR as a sequence prediction task, most state-of-the-art (SOTA) approaches employ the attention-based encoder-decoder architecture to recognize texts. However, these methods still struggle in localizing the precise alignment center associated with the current character, which is also named as the attention drift phenomenon. One major reason is that directly converting low-quality or distorted word images to sequential features may introduce confusing information and thus mislead the network. To address the problem, this paper proposes a cascade attention network. The model is composed of three novel attention modules: a vanilla attention module that attends to sequential features from the horizontal direction, a cross-network attention module to take advantage of both one-dimension contextual information and two-dimension visual distributions, and an aspects fusion attention module to fuse spatial and channel-wise information. Accordingly, the network manages to yield distinguished and refined representations correlated to the target sequence. Compared to SOTA methods, experimental results on seven benchmarks demonstrate the superiority of our framework in recognizing scene texts on various conditions.

Year	DOI	Venue
2021	10.1145/3460426.3463639	International Multimedia Conference
Keywords	DocType	Citations
Scene text recognition, Cascade attention network, Attention drift	Conference	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Min Zhang	1	27	17.07
Meng Ma	2	78	15.71
Ping Wang	3	93	44.15

1