A lazy man's way to part-of-speech tagging - Citegraph

Paper Info

Title
A lazy man's way to part-of-speech tagging

Abstract
A statistical-based approach to word alignment involving automatically projecting part-of-speech (POS) tags is presented. The approach is referred to as the "lazy man's way" because it improves POS assignment for a resource-poor language by exploiting its similarity to a resource-rich one. This unsupervised learning method combines the N-gram and Dice Coefficient similarity functions in order to align English texts with Malay texts thus projecting the POS tags from English to Malay. It is a quick method that does not require the laborious effort needed to annotate the Malay dataset. A case study, an experiment done on 25 terrorism news articles written in Malay, has shown that leveraging pre-existing resources from a resource-rich language, i.e. English, to supplement a resource-poor language, i.e. Malay, is feasible and avoids building new text-processing tools from scratch. The system was tested on the Malay corpus, consisting of 5413 word tokens. The results reached values of 86.87% for precision, 72.56% for recall and 79.07% for F1-Score. This shows that the "lazy man's way", where a resource-poor language just exploits the rich linguistic information available in English, increases bitext projection accuracy significantly.

Year	DOI	Venue
2012	10.1007/978-3-642-32541-0_9	PKAW
Keywords	Field	DocType
english text,resource-poor language,pos assignment,resource-rich language,dice coefficient similarity function,malay corpus,pos tag,malay dataset,lazy man,malay text	Rule-based machine translation,Sørensen–Dice coefficient,Computer science,Malay,Part-of-speech tagging,Exploit,Unsupervised learning,Natural language processing,Artificial intelligence,Proper noun,Recall,Machine learning	Conference
Citations	PageRank	References
1	0.54	10
Authors
4

Authors (4 rows)

Cited by (1 rows)

References (10 rows)

Name	Order	Citations	PageRank
Norshuhani Zamin	1	5	3.06
Alan Oxley	2	1	0.54
Zainab Abu Bakar	3	19	7.35
Syed Ahmad Farhan	4	1	0.54

1