A Dictionary Based Urdu Word Segmentation Using Maximum Matching Algorithm for Space Omission Problem - Citegraph

Paper Info

Title
A Dictionary Based Urdu Word Segmentation Using Maximum Matching Algorithm for Space Omission Problem

Abstract
The foremost step in any Natural Language Processing system is Word Segmentation. Word segmentation means dividing a sentence into the words it consists. For this research purpose Urdu is selected because very less work has been done. In Urdu space cannot be used in marking word boundary because it is not consistently used. Urdu word segmentation is different from other Asian languages in that it consist both Space Omission and Space Insertion problem. This paper discusses these problems and suggests a technique that solves both of these problems. It uses simple and already used basic techniques in a different way to develop an efficient Segmentation Algorithm. Morphological analysis of Urdu Text is also taken into account. Dictionary is used for verification and identification of Urdu Words. This work has been tested on words collected from Geo, Jang, BBC news sites and other online documents available on internet. The proposed algorithm has been tested on 11,995 words and 97.2% of these words are segmented correctly.

Year	DOI	Venue
2012	10.1109/IALP.2012.11	IALP
Keywords	Field	DocType
research purpose urdu,efficient segmentation algorithm,space omission,word segmentation,space omission problem,urdu words,word boundary,urdu text,urdu word segmentation,maximum matching algorithm,space insertion problem,urdu space,pattern matching,internet,electronic publishing,natural language processing,dictionaries,text analysis	Computer science,Artificial intelligence,Natural language processing,Word processing,The Internet,Segmentation,Algorithm,Matching (graph theory),Text segmentation,Speech recognition,Urdu,Pattern matching,Sentence	Conference
ISSN	Citations	PageRank
2159-1962	1	0.36
References	Authors
1	2

Authors (2 rows)

Cited by (1 rows)

References (1 rows)

Name	Order	Citations	PageRank
Rabiya Rashid	1	1	0.36
Seemab Latif	2	27	5.71

1