Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic - Citegraph

Paper Info

Title
Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic

Abstract
We present a working Arabic information extraction (IE) system that is used to analyze large volumes of news texts every day to extract the named entity (NE) types person, organization, location, date and number, as well as quotations (direct reported speech) by and about people. The Named Entity Recognition (NER) system was not developed for Arabic, but - instead - a highly multilingual, almost language-independent NER system was adapted to also cover Arabic. The Semitic language Arabic substantially differs from the Indo-European and Finno-Ugric languages currently covered. This paper thus describes what Arabic language-specific resources had to be developed and what changes needed to be made to the otherwise language-independent rule set in order to be applicable to the Arabic language. The achieved evaluation results are generally satisfactory, but could be improved for certain entity types.

Year	Venue	Field
2010	LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION	Entity linking,Arabic,Computer science,Semitic languages,Named entity,Speech recognition,Information extraction,Artificial intelligence,Natural language processing,Indirect speech,Named-entity recognition,Linguistics
DocType	Citations	PageRank
Conference	6	0.47
References	Authors
11	4

Authors (4 rows)

Cited by (6 rows)

References (11 rows)

Name	Order	Citations	PageRank
Wajdi Zaghouani	1	197	21.27
Bruno Pouliquen	2	678	58.19
Mohamed Ebrahim	3	57	4.17
Ralf Steinberger	4	949	79.70

1