Title
End-to-End Trainable Thai OCR System Using Hidden Markov Models
Abstract
In this paper we present an end-to-end trainable Optical Character Recognition (OCR) system for recognizing machine-printed text in Thai documents. The end-to-end OCR system is based on a script-independent methodology using hidden Markov models. Our system provides an integrated workflow beginning with annotation and transcription of training images to performing OCR on new images with models trained on transcribed training images. The efficacy of our end-to-end OCR system is demonstrated by rapidly configuring our OCR engine for the Thai script. We present experimental results on Thai documents to highlight the specific challenges posed by the Thai script and analyze the recognition performance as a function of amount of training data.
Year
DOI
Venue
2008
10.1109/DAS.2008.76
Document Analysis Systems
Keywords
Field
DocType
hidden markov models,end-to-end trainable thai ocr,training data,end-to-end trainable,optical character recognition,ocr engine,thai document,end-to-end ocr system,thai script,training image,transcribed training image,annotation,engines,end to end,text analysis,layout,transcription,graphical user interfaces,hidden markov model
Training set,Annotation,Computer science,End-to-end principle,Optical character recognition,Speech recognition,Graphical user interface,Artificial intelligence,Natural language processing,Hidden Markov model,Document handling,Workflow
Conference
ISBN
Citations 
PageRank 
978-0-7695-3337-7
2
0.40
References 
Authors
10
4
Name
Order
Citations
PageRank
K. Krstovski1286.24
Ehry MacRostie2585.73
Rohit Prasad346539.06
Premkumar Natarajan487479.46