Title
Text Recognition - Real World Data And Where To Find Them
Abstract
We present a method for exploiting weakly annotated images to improve text extraction pipelines. The approach uses an arbitrary end-to-end text recognition system to obtain text region proposals and their, possibly erroneous, transcriptions. The method includes matching of imprecise transcriptions to weak annotations and an edit distance guided neighbourhood search. It produces nearly error-free, localised instances of scene text, which we treat as "pseudo ground truth" (PGT).The method is applied to two weakly-annotated datasets. Training with the extracted PGT consistently improves the accuracy of a state of the art recognition model, by 3.7% on average, across different benchmark datasets (image domains) and 24.5% on one of the weakly annotated datasets.
Year
DOI
Venue
2020
10.1109/ICPR48806.2021.9412868
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)
DocType
ISSN
Citations 
Conference
1051-4651
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Klára Janoušková100.34
Jiri Matas200.34
Lluís Gómez3938.74
Dimosthenis Karatzas440638.13