Constructing a lexicon of Arabic-English named entity using SMT and semantic linked data

Joint Authors

Hkiri, Emna
Mallat, Suhayl
Zrigui, Munir

Source

The International Arab Journal of Information Technology

Issue

Vol. 14, Issue 6 (30 Nov. 2017)6 p.

Publisher

Zarqa University

Publication Date

2017-11-30

Country of Publication

Jordan

No. of Pages

6

Main Subjects

Information Technology and Computer Science

Abstract EN

Named entity recognition is the problem of locating and categorizing atomic entities in a given text.

In this work, we used DBpedia Linked datasets and combined existing open source tools to generate from a parallel corpus a bilingual lexicon of Named Entities (NE).

To annotate NE in the monolingual English corpus, we used linked data entities by mapping them to Gate Gazetteers.

In order to translate entities identified by the gate tool from the English corpus, we used moses, a statistical machine translation system.

The construction of the Arabic-English named entities lexicon is based on the results of moses translation.

Our method is fully automatic and aims to help Natural Language Processing (NLP) tasks such as, machine translation information retrieval, text mining and question answering.

Our lexicon contains 48753 pairs of Arabic-English NE, it is freely available for use by other researchers

American Psychological Association (APA)

Hkiri, Emna& Mallat, Suhayl& Zrigui, Munir. 2017. Constructing a lexicon of Arabic-English named entity using SMT and semantic linked data. The International Arab Journal of Information Technology،Vol. 14, no. 6.
https://search.emarefa.net/detail/BIM-853091

Modern Language Association (MLA)

Hkiri, Emna…[et al.]. Constructing a lexicon of Arabic-English named entity using SMT and semantic linked data. The International Arab Journal of Information Technology Vol. 14, no. 6 (Nov. 2017).
https://search.emarefa.net/detail/BIM-853091

American Medical Association (AMA)

Hkiri, Emna& Mallat, Suhayl& Zrigui, Munir. Constructing a lexicon of Arabic-English named entity using SMT and semantic linked data. The International Arab Journal of Information Technology. 2017. Vol. 14, no. 6.
https://search.emarefa.net/detail/BIM-853091

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-853091