Using language independent and language specific features to enhance arabic named entity recognition
Joint Authors
Ben Ajiba, Yasin
Diyab, Muna
Rosso, Paolo
Source
The International Arab Journal of Information Technology
Issue
Vol. 6, Issue 5 (30 Nov. 2009), pp.464-472, 9 p.
Publisher
Publication Date
2009-11-30
Country of Publication
Jordan
No. of Pages
9
Main Subjects
Information Technology and Computer Science
Topics
Abstract EN
The named entity recognition task has been garnering significant attention as it has been shown to help improve the performance of many natural language processing applications.
More recently, we are starting to see a surge in developing named entity recognition systems for languages other than English.
With the relative abundance of resources for the Arabic language and a certain degree of maturation in the state of art for processing Arabic, it is natural to see interest in developing NER systems for the language.
In this paper, we investigate the impact of using different sets of features that are both language independent and language specific in a discriminative machine learning framework, namely, support vector machines.
We explore lexical, contextual and morphological features and nine data-sets of different genres and annotations.
We systematically measure the impact of the different features in isolation and combined.
We achieve the highest performance using a combination of all features, f1=82.71.
Essentially combining language independent features with language specific ones yields the best performance on all the genres of text we investigate.
However, on a class level, we observe that the different classes of named entities benefit differently from the morphological features employed.
American Psychological Association (APA)
Ben Ajiba, Yasin& Diyab, Muna& Rosso, Paolo. 2009. Using language independent and language specific features to enhance arabic named entity recognition. The International Arab Journal of Information Technology،Vol. 6, no. 5, pp.464-472.
https://search.emarefa.net/detail/BIM-10103
Modern Language Association (MLA)
Rosso, Paolo…[et al.]. Using language independent and language specific features to enhance arabic named entity recognition. The International Arab Journal of Information Technology Vol. 6, no. 5 (Nov. 2009), pp.464-472.
https://search.emarefa.net/detail/BIM-10103
American Medical Association (AMA)
Ben Ajiba, Yasin& Diyab, Muna& Rosso, Paolo. Using language independent and language specific features to enhance arabic named entity recognition. The International Arab Journal of Information Technology. 2009. Vol. 6, no. 5, pp.464-472.
https://search.emarefa.net/detail/BIM-10103
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references : p. 470-471
Record ID
BIM-10103