Title
Towards Searchable Digital Urdu Libraries - A Word Spotting Based Retrieval Approach
Abstract
Libraries in South Asia hold huge collections of valuable printed documents in Urdu and it is of interest to digitize these collections to make them more accessible. The unavailability of an OCR for Urdu however limits the concept of a digital Urdu library to scanning of documents only, offering very limited search facility based on manually assigned tags. We address this issue by proposing a word spotting based keyword search method for information retrieval in digitized collections of printed Urdu documents. The proposed method is based on segmentation of Urdu text in to partial words and representing each partial word by a set of features. To search a specific word (or phrase), the user provides a query in the form of an image. Comparing the features of the partial words in the query image with the ones already indexed, the user is provided with a list of documents containing occurrences of the queried word. The system evaluated on 50 Urdu documents exhibited a recall of 95.17% and a precision of 94.3%.
Year
DOI
Venue
2011
10.1109/ICDAR.2011.270
ICDAR-1
Keywords
Field
DocType
urdu text segmentation,specific word,partial word,printed urdu documents,query image,urdu text,searchable digital urdu library,digital urdu library,library automation,digital libraries,printed urdu document,information retrieval,ocr,towards searchable digital urdu,word spotting based retrieval approach,keyword search method,digitized collection,word spotting,retrieval approach,urdu digital libraries,limited search facility,document image processing,word spotting based keyword search,south asia libraries,urdu document,dynamic time warping,sorting,vectors,feature extraction,image segmentation,indexing
Dynamic time warping,Information retrieval,Computer science,Segmentation,Phrase,Partial word,Urdu,Artificial intelligence,Natural language processing,Digital library,Spotting,Visual Word
Conference
ISSN
ISBN
Citations 
1520-5363 E-ISBN : 978-0-7695-4520-2
978-0-7695-4520-2
8
PageRank 
References 
Authors
0.51
13
3
Name
Order
Citations
PageRank
Ali Abidi1221.43
Imran Siddiqi242136.56
Khurram Khurshid312915.94