Named Entity Recognition and Classification for Punjabi Shahmukhi - Citegraph

Paper Info

Title
Named Entity Recognition and Classification for Punjabi Shahmukhi

Abstract
AbstractNamed entity recognition (NER) refers to the identification of proper nouns from natural language text and classifying them into named entity types, such as person, location, and organization. Due to the widespread applications of NER, numerous NER techniques and benchmark datasets have been developed for both Western and Asian languages. Even though Shahmukhi script of the Punjabi language has been used by nearly three fourths of the Punjabi speakers worldwide, Gurmukhi has been the main focus of research activities. Specifically, a benchmark NER corpus for Shahmukhi is non-existent, which has thwarted the commencement of NER research for the Shahmukhi script. To this end, this article presents the development and specifications of the first-ever NER corpus for Shahmukhi. The newly developed corpus is composed of 318,275 tokens and 16,300 named entities, including 11,147 persons, 3,140 locations, and 2,013 organizations. To establish the strength of our corpus, we have compared the specifications of our corpus with its Gurmukhi counterparts. Furthermore, we have demonstrated the usability of our corpus using five supervised learning techniques, including two state-of-the-art deep learning techniques. The results are compared, and valuable insights about the behaviors of the most effective technique are discussed.

Year	DOI	Venue
2020	10.1145/3383306	ACM Transactions on Asian and Low-Resource Language Information Processing
Keywords	DocType	Volume
Low-resource languages, Asian languages, Punjabi, Shahmukhi, named entity recognition	Journal	19
Issue	ISSN	Citations
4	2375-4699	0
PageRank	References	Authors
0.34	0	9

Authors (9 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
AhmadMuhammad Tayyab	1	0	0.34
MalikMuhammad Kamran	2	1	0.71
Khurram Shahzad	3	165	25.77
Faisal Aslam	4	0	0.34
Asif Iqbal	5	0	0.34
Zubair Nawaz	6	0	0.68
Faisal Bukhari	7	0	0.34
Muhammad Tayyab Ahmad	8	0	0.34
Muhammad Kamran Malik	9	0	0.34

1