Title
Keyword Based Information Retrieval System for Urdu Document Images.
Abstract
Various dynasties ruled the Indian sub-continent and left behind enormous and rich cultural heritage that also included intellectually enriched research in the shape of various documents scripted in Urdu. In order to provide efficient access to this knowledge, analysis though digitizing the existing work is the need of hour. In addition to digitization, efficient search mechanisms also need to be implemented to provide users a rapid access to the queried information. In most cases, the digitized documents are complemented by manually assigned tags which not only is a time consuming task but also provides a very limited search facility. Automating the transcription of these documents using Optical Character Recognition (OCR) systems is also challenging due to the very complex cursive nature of Urdu text. To overcome these limitations, a keyword spotting based information retrieval system for document images is introduced in this study. The proposed technique relies on two major modules, document indexing and retrieval. Images of documents are segmented into partial words (ligatures) and identical partial words (PWs) are grouped into clusters. We introduce the concept of considering each (partial) word as a unique shape and a set of shape descriptors is extracted to characterize the PWs. The clusters of PWs are used to index a given set of documents. During retrieval, the query word presented to the system is matched with the clusters in the database and all documents containing instances of the query word are retrieved and presented to the user. The system evaluated on a set of printed Urdu documents in Nastaliq font realized promising precision and recall rates.
Year
DOI
Venue
2015
10.1109/SITIS.2015.16
SITIS
Keywords
Field
DocType
Information Retrieval, Word Spotting, Partial words, Clustering, Indexing
Cursive,Information retrieval,Pattern recognition,Computer science,Document clustering,Precision and recall,Search engine indexing,Optical character recognition,Keyword spotting,Artificial intelligence,Concept search,Visual Word
Conference
Citations 
PageRank 
References 
2
0.38
19
Authors
5
Name
Order
Citations
PageRank
Raashid Hussain171.08
Haris Ahmad Khan220.38
Imran Siddiqi342136.56
Khurram Khurshid412915.94
Asif Masood513712.91