Title
Urdu part of speech tagging using conditional random fields
Abstract
Part of speech (POS) tagging, the assignment of syntactic categories for words in running text, is significant to natural language processing as a preliminary task in applications such as speech processing, information extraction, and others. Urdu language processing presents a challenge due to the dual behaviour of various Urdu POS tags in differing situations (morphosyntactic ambiguity). This paper addresses this challenge by developing a novel tagging approach using linear-chain conditional random fields (CRF). Our work is the first instance of a CRF approach for Urdu POS tagging. The proposed model employs a strong, stable and balanced language-independent as well as language dependent feature set. The language-dependent feature considered includes part-of-speech tag of the previous word and suffix of the current word while the language-independent features includes the ‘context words window’. Our approach was evaluated against support vector machine techniques for Urdu POS—considered as state of the art—on two benchmark datasets. The results show our CRF approach to improve upon the F-measure of prior attempts by 8.3–8.5%.
Year
DOI
Venue
2019
10.1007/s10579-018-9439-6
Language Resources and Evaluation
Keywords
Field
DocType
Urdu, Part of speech (POS), Conditional random field (CRF), Support vector machine (SVM)
Conditional random field,Speech processing,Suffix,Computer science,Support vector machine,Speech recognition,Part of speech,Information extraction,Artificial intelligence,Natural language processing,Ambiguity,Syntax
Journal
Volume
Issue
ISSN
53
3
1574-0218
Citations 
PageRank 
References 
0
0.34
18
Authors
7
Name
Order
Citations
PageRank
Wahab Khan100.34
Ali Daud231330.17
Jamal A. Nasir382.17
Tehmina Amjad4957.96
Sachi Arafat500.34
Naif Radi Aljohani615927.35
Fahd S. Alotaibi700.34