Title
How To Efficiently Increase Resolution in Neural OCR Models
Abstract
Modern CRNN OCR models require a fixed line height for all images, and it is known that, up to a point, increasing this input resolution improves recognition performance. However, doing so by simply increasing the line height of input images without changing the CRNN architecture has a large cost in memory and computation (they both scale O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) w.r.t. the input line height).We introduce a few very small convolutional and max pooling layers to a CRNN model to rapidly downsample high resolution images to a more manageable resolution before passing off to the "base" CRNN model. Doing this greatly improves recognition performance with a very modest increase in computation and memory requirements. We show a 33% relative improvement in WER, from 8.8% to 5.9% when increasing the input resolution from 30px line height to 240px line height on Open-HART/MADCAT Arabic handwriting data.This is a new state of the art result on Arabic handwriting, and the large improvement from an already strong baseline shows the impact of this technique.
Year
DOI
Venue
2018
10.1109/ASAR.2018.8480182
2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)
Keywords
Field
DocType
fixed line height,input resolution,recognition performance,input images,CRNN architecture,computation,input line height,convolutional pooling layers,max pooling layers,manageable resolution,base CRNN model,memory requirements,neural OCR models,CRNN OCR models,downsample high resolution images
Decimation,Pattern recognition,Arabic handwriting,Computer science,Pooling,Artificial intelligence,Computation
Conference
ISBN
Citations 
PageRank 
978-1-5386-1460-0
1
0.41
References 
Authors
0
4
Name
Order
Citations
PageRank
stephen rawls1594.08
Huaigu Cao234729.09
Joe Mathai330.79
Premkumar Natarajan487479.46