Title
Searching Off-line Arabic Documents
Abstract
Currently an abundance of historical manuscripts, journals, and scientific notes remain largely unaccessible in library archives. Manual transcription and publication of such documents is unlikely, and automatic transcription with high enough accuracy to support a traditional text search is difficult. In this work we describe a lexicon-free system for performing text queries on off-line printed and handwritten Arabic documents. Our segmentation-based approach utilizes gHMMs with a bigram letter transition model, and KPCA/LDA for letter discrimination. The segmentation stage is integrated with inference. We show that our method is robust to varying letter forms, ligatures, and overlaps. Additionally, we find that ignoring letters beyond the adjoining neighbors has little effect on inference and localization, which leads to a significant performance increase over standard dynamic programming. Finally, we discuss an extension to perform batch searches of large word lists for indexing purposes.
Year
DOI
Venue
2006
10.1109/CVPR.2006.269
CVPR (2)
Keywords
Field
DocType
batch search,letter discrimination,searching off-line arabic documents,bigram letter transition model,manual transcription,handwritten arabic document,varying letter form,text query,traditional text search,automatic transcription,adjoining neighbor,computer vision,dynamic programming,robustness,writing,indexation,indexing,training data,handwriting recognition,linear discriminant analysis
Computer vision,Pattern recognition,Segmentation,Computer science,Inference,Full text search,Search engine indexing,Handwriting recognition,Robustness (computer science),Artificial intelligence,Bigram,Linear discriminant analysis
Conference
ISBN
Citations 
PageRank 
0-7695-2597-0
32
1.45
References 
Authors
11
3
Name
Order
Citations
PageRank
Jim Chan1403.26
Celal Ziftci2937.24
D. A. Forsyth392271138.80