Analysis and development of Urdu POS tagged corpus - Citegraph

Paper Info

Title
Analysis and development of Urdu POS tagged corpus

Abstract
In this paper, two corpora of Urdu (with 110K and 120K words) tagged with different POS tagsets are used to train TnT and Tree taggers. Error analysis of both taggers is done to identify frequent confusions in tagging. Based on the analysis of tagging, and syntactic structure of Urdu, a more refined tagset is derived. The existing tagged corpora are tagged with the new tagset to develop a single corpus of 230K words and the TnT tagger is retrained. The results show improvement in tagging accuracy for individual corpora to 94.2% and also for the merged corpus to 91%. Implications of these results are discussed.

Year	Venue	Keywords
2009	ALR7@IJCNLP	new tagset,urdu pos,merged corpus,error analysis,tree taggers,different pos tagsets,single corpus,tnt tagger,refined tagset,tagging accuracy,individual corpus
Field	DocType	Citations
Computer science,Speech recognition,Urdu,Artificial intelligence,Natural language processing,Syntactic structure	Conference	10
PageRank	References	Authors
1.01	3	3

Authors (3 rows)

Cited by (10 rows)

References (3 rows)

Name	Order	Citations	PageRank
Ahmed Muaz	1	10	1.01
Aasim Ali	2	10	1.69
Sarmad Hussain	3	96	12.15

1