An Open Source Tesseract Based Optical Character Recognizer for Bangla Script - Citegraph

Paper Info

Title
An Open Source Tesseract Based Optical Character Recognizer for Bangla Script

Abstract
BanglaOCR is currently the only open source optical character recognition (OCR) software for the Bangla (Bengali) script developed by the Center for Research on Bangla Language Processing (CRBLP). Tesseract, maintained by Google, is considered to be one of the most accurate free open source OCR engines currently available. In this paper, we present a new OCR for the Bangla/Bengali script that combines the recognition power of Tesseract and the Bangla script processing power of BanglaOCR by integrating the Tesseract recognition engine into BanglaOCR. We first present the complete methodology to build the combined OCR, followed by the implementation strategy. In this paper, we focus on the training data preparation process, Tesseract integration procedure and the post-processing techniques. The techniques described in this paper can be readily applied to build OCRs for other scripts as well.

Year	DOI	Venue
2009	10.1109/ICDAR.2009.62	ICDAR-1
Keywords	Field	DocType
bangla script processing power,tesseract integration procedure,bangla script,recognition power,bengali script,new ocr,bangla language processing,ocr engine,open source,combined ocr,optical character recognition,optical character recognizer,tesseract recognition engine,natural language processing,testing,graphical user interfaces,training data,image recognition,engines,image segmentation,tesseract,accuracy,search engines,packaging,public domain software	Training set,Computer science,Optical character recognition,Speech recognition,Software,Bengali,Natural language processing,Tesseract,Artificial intelligence,Scripting language,Optical character recognition software,Public domain software	Conference
Citations	PageRank	References
4	0.57	1
Authors
3

Authors (3 rows)

Cited by (4 rows)

References (1 rows)

Name	Order	Citations	PageRank
Md. Abul Hasnat	1	10	3.72
Muttakinur Rahman Chowdhury	2	4	0.57
Mumit Khan	3	8	2.21

1