Using language independent and language specific features to enhance arabic named entity recognition

Joint Authors

Ben Ajiba, Yasin
Diyab, Muna
Rosso, Paolo

Source

The International Arab Journal of Information Technology

Issue

Vol. 6, Issue 5 (30 Nov. 2009), pp.464-472, 9 p.

Publisher

Zarqa University

Publication Date

2009-11-30

Country of Publication

Jordan

No. of Pages

9

Main Subjects

Information Technology and Computer Science

Topics

Abstract EN

The named entity recognition task has been garnering significant attention as it has been shown to help improve the performance of many natural language processing applications.

More recently, we are starting to see a surge in developing named entity recognition systems for languages other than English.

With the relative abundance of resources for the Arabic language and a certain degree of maturation in the state of art for processing Arabic, it is natural to see interest in developing NER systems for the language.

In this paper, we investigate the impact of using different sets of features that are both language independent and language specific in a discriminative machine learning framework, namely, support vector machines.

We explore lexical, contextual and morphological features and nine data-sets of different genres and annotations.

We systematically measure the impact of the different features in isolation and combined.

We achieve the highest performance using a combination of all features, f1=82.71.

Essentially combining language independent features with language specific ones yields the best performance on all the genres of text we investigate.

However, on a class level, we observe that the different classes of named entities benefit differently from the morphological features employed.

American Psychological Association (APA)

Ben Ajiba, Yasin& Diyab, Muna& Rosso, Paolo. 2009. Using language independent and language specific features to enhance arabic named entity recognition. The International Arab Journal of Information Technology،Vol. 6, no. 5, pp.464-472.
https://search.emarefa.net/detail/BIM-10103

Modern Language Association (MLA)

Rosso, Paolo…[et al.]. Using language independent and language specific features to enhance arabic named entity recognition. The International Arab Journal of Information Technology Vol. 6, no. 5 (Nov. 2009), pp.464-472.
https://search.emarefa.net/detail/BIM-10103

American Medical Association (AMA)

Ben Ajiba, Yasin& Diyab, Muna& Rosso, Paolo. Using language independent and language specific features to enhance arabic named entity recognition. The International Arab Journal of Information Technology. 2009. Vol. 6, no. 5, pp.464-472.
https://search.emarefa.net/detail/BIM-10103

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 470-471

Record ID

BIM-10103