Title | ||
---|---|---|
EMU: Effective Multi-Hot Encoding Net for Lightweight Scene Text Recognition With a Large Character Set |
Abstract | ||
---|---|---|
Deploying a lightweight deep model for scene text recognition task on mobile devices has great commercial value. However, the conventional softmax-based one-hot classification module becomes a cumbersome obstacle when handling multi-languages or languages with large character set (
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e.g.</i>
, Chinese) due to the rapid expansion of model parameters with the number of classes. To this end, we propose an Effective Multi-hot encoding and classification modUle (EMU) for scene text recognition in the scenario of multi-languages or languages with large character set. Specifically, EMU generates a binary multi-hot label for each class with a real-valued sub-network in training stage and produces the prediction by calculating the inner product between the multi-hot code and the multi-hot label. Compared to the softmax-based one-hot classifier, EMU reduces the storage requirement and the time cost in inference stage significantly, retaining similar performance. Furthermore, we design a convolution feature based
<bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Light</b>
weight Trans
<bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Former</b>
to learn the effective features for EMU and consequently develop a lightweight scene text recognition framework, termed
<bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Light-Former-EMU</b>
. We conduct extensive experiments on seven public English benchmarks and two real-world Chinese challenge benchmarks. Experimental results verify the effectiveness of the proposed EMU and demonstrate the promising performance of the proposed Light-Former-EMU. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/TCSVT.2022.3146240 | IEEE Transactions on Circuits and Systems for Video Technology |
Keywords | DocType | Volume |
Multi-hot encoding,multi-hot classifier,transformer,lightweight transformer,scene text recognition | Journal | 32 |
Issue | ISSN | Citations |
8 | 1051-8215 | 0 |
PageRank | References | Authors |
0.34 | 14 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bingcong Li | 1 | 0 | 0.34 |
Xin Tang | 2 | 0 | 0.34 |
Xianbiao Qi | 3 | 103 | 8.25 |
Yihao Chen | 4 | 0 | 0.34 |
Chun-Guang Li | 5 | 310 | 17.35 |
Rong Xiao | 6 | 0 | 0.34 |