Title
Baseline Dependent Percentile Features for Offline Arabic Handwriting Recognition
Abstract
Handwritten text in Arabic and other languages exhibit significant variations in the slant and baseline of characters across words and also within a single word. Since the concept of baseline does not have a precise mathematical definition, existing approaches use heuristic methods to first identify a set of baseline relevant pixels and then fit lines/curves through them. However, for statistical features like percentiles that we use in our system, we only need an approximate curve that is close to the baseline to normalize the features. Hence we propose a two stage approach to estimate the approximate baseline. First we segment the text line into a set of components, and then estimate the baseline in each component using two methods - max projection and smoothed centroid line. We incorpate the computed baseline into percentile feature computation in the BBN Byblos OCR system for an Arabic offline handwriting recognition task. Our new features, result in a 1% absolute gain and 3.1% relative gain in the word error rate on a large test set with 15K handwritten Arabic words, which is statistically significant with p-valuev
Year
DOI
Venue
2011
10.1109/ICDAR.2011.74
ICDAR-1
Keywords
Field
DocType
handwritten arabic word,computed baseline,approximate baseline,handwritten text,baseline relevant pixel,centroid line,approximate curve,absolute gain,offline arabic handwriting recognition,baseline dependent percentile features,bbn byblos ocr system,arabic offline handwriting recognition,statistical analysis,feature extraction,estimation,merging,hidden markov models,handwriting recognition,optical character recognition
Pattern recognition,Computer science,Word error rate,Optical character recognition,Handwriting recognition,Feature extraction,Speech recognition,Feature (machine learning),Artificial intelligence,Percentile,Intelligent word recognition,Test set
Conference
ISSN
Citations 
PageRank 
1520-5363
1
0.39
References 
Authors
10
6
Name
Order
Citations
PageRank
Premkumar Natarajan187479.46
David Belanger2132.81
Rohit Prasad346539.06
Matin Kamali4252.39
Krishna Subramanian514915.40
Prem Natarajan6182.62