Title
Scene Text Recognition with Cascade Attention Network
Abstract
ABSTRACTScene text recognition (STR) has experienced increasing popularity both in academia and in industry. Regarding STR as a sequence prediction task, most state-of-the-art (SOTA) approaches employ the attention-based encoder-decoder architecture to recognize texts. However, these methods still struggle in localizing the precise alignment center associated with the current character, which is also named as the attention drift phenomenon. One major reason is that directly converting low-quality or distorted word images to sequential features may introduce confusing information and thus mislead the network. To address the problem, this paper proposes a cascade attention network. The model is composed of three novel attention modules: a vanilla attention module that attends to sequential features from the horizontal direction, a cross-network attention module to take advantage of both one-dimension contextual information and two-dimension visual distributions, and an aspects fusion attention module to fuse spatial and channel-wise information. Accordingly, the network manages to yield distinguished and refined representations correlated to the target sequence. Compared to SOTA methods, experimental results on seven benchmarks demonstrate the superiority of our framework in recognizing scene texts on various conditions.
Year
DOI
Venue
2021
10.1145/3460426.3463639
International Multimedia Conference
Keywords
DocType
Citations 
Scene text recognition, Cascade attention network, Attention drift
Conference
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Min Zhang12717.07
Meng Ma27815.71
Ping Wang39344.15