A deep learning approach for the Romanized Tunisian dialect identification

Joint Authors

Yunus, Jihene
Ashur, Hadhemi
Suwaysi, Aminah
Ferchichi, Ahmad

Source

The International Arab Journal of Information Technology

Issue

Vol. 17, Issue 6 (30 Nov. 2020), pp.935-946, 12 p.

Publisher

Zarqa University Deanship of Scientific Research

Publication Date

2020-11-30

Country of Publication

Jordan

No. of Pages

12

Main Subjects

Information Technology and Computer Science

Abstract EN

Language identification is an important task in natural language processing that consists of determining the language of a given text.

It has increasingly picked the interest of researchers for the past few years, especially for code-switching informal textual content.

This paper, focuses on the identification of the Romanized user-generated Tunisian dialect on the social web.

Segmented and annotated a corpus extracted from social media and propose a deep learning approach for the identification task.

A Bidirectional Long Short-Term Memory neural network with Conditional Random Fields decoding (BLSTM-CRF) had been used.

For word embeddings, a combination of word-character BLSTM vector representation and Fast Text embeddings that takes into consideration character n-gram features.

The overall accuracy obtained is 98.65%.

American Psychological Association (APA)

Yunus, Jihene& Ashur, Hadhemi& Suwaysi, Aminah& Ferchichi, Ahmad. 2020. A deep learning approach for the Romanized Tunisian dialect identification. The International Arab Journal of Information Technology،Vol. 17, no. 6, pp.935-946.
https://search.emarefa.net/detail/BIM-1434011

Modern Language Association (MLA)

Yunus, Jihene…[et al.]. A deep learning approach for the Romanized Tunisian dialect identification. The International Arab Journal of Information Technology Vol. 17, no. 6 (Nov. 2020), pp.935-946.
https://search.emarefa.net/detail/BIM-1434011

American Medical Association (AMA)

Yunus, Jihene& Ashur, Hadhemi& Suwaysi, Aminah& Ferchichi, Ahmad. A deep learning approach for the Romanized Tunisian dialect identification. The International Arab Journal of Information Technology. 2020. Vol. 17, no. 6, pp.935-946.
https://search.emarefa.net/detail/BIM-1434011

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 943-946

Record ID

BIM-1434011