Title
KHATT: An open Arabic offline handwritten text database
Abstract
A comprehensive Arabic handwritten text database is an essential resource for Arabic handwritten text recognition research. This is especially true due to the lack of such database for Arabic handwritten text. In this paper, we report our comprehensive Arabic offline Handwritten Text database (KHATT) consisting of 1000 handwritten forms written by 1000 distinct writers from different countries. The forms were scanned at 200, 300, and 600 dpi resolutions. The database contains 2000 randomly selected paragraphs from 46 sources, 2000 minimal text paragraph covering all the shapes of Arabic characters, and optionally written paragraphs on open subjects. The 2000 random text paragraphs consist of 9327 lines. The database forms were randomly divided into 70%, 15%, and 15% sets for training, testing, and verification, respectively. This enables researchers to use the database and compare their results. A formal verification procedure is implemented to align the handwritten text with its ground truth at the form, paragraph and line levels. The verified ground truth database contains meta-data describing the written text at the page, paragraph, and line levels in text and XML formats. Tools to extract paragraphs from pages and segment paragraphs into lines are developed. In addition we are presenting our experimental results on the database using two classifiers, viz. Hidden Markov Models (HMM) and our novel syntactic classifier. The database is made freely available to researchers world-wide for research in various handwritten-related problems such as text recognition, writer identification and verification, forms analysis, pre-processing, segmentation. Several international research groups/researchers acquired the database for use in their research so far.
Year
DOI
Venue
2014
10.1016/j.patcog.2013.08.009
Pattern Recognition
Keywords
Field
DocType
line level,open arabic,comprehensive arabic,handwritten text,arabic handwritten text,ground truth database,minimal text paragraph,offline handwritten text database,handwritten text database,database form,arabic handwritten text recognition
XML,Segmentation,Computer science,Ground truth,Paragraph,Artificial intelligence,Natural language processing,Hidden Markov model,Classifier (linguistics),Syntax,Database,Formal verification
Journal
Volume
Issue
ISSN
47
3
0031-3203
Citations 
PageRank 
References 
26
1.00
48
Authors
7
Name
Order
Citations
PageRank
Sabri A. Mahmoud144333.96
Irfan Ahmad212310.13
Wasfi G. Al-Khatib313911.03
Mohammad Al-shayeb436528.47
Mohammad Tanvir Parvez51699.19
Volker Märgner629529.02
Gernot A. Fink788882.73