Challenges in building corpora for Algerian Arabic from CMC content

Joint Authors

Bu Hania, Bashir
Umari, Muhammad

Source

El-Hakika (The Truth) Journal for Social And Human Sciences

Issue

Vol. 21, Issue 4 (31 Dec. 2022), pp.594-617, 24 p.

Publisher

University Ahmad Draia

Publication Date

2022-12-31

Country of Publication

Algeria

No. of Pages

24

Main Subjects

Arabic language and Literature

Topics

Abstract EN

Algerian Arabic is an under-resourced Arabic dialect.

few corpora and natural language processing tools were developed for it.

this is due to a variety of factors such as its lack of written content and of a standard orthography as well as the frequent code-switching and script switching exhibited by its speakers.

these factors render developing homogenous corpora for the dialect more challenging compared to other Arabic dialects where such factors are less pronounced.

the objective of this work is to examine the challenges and issues encountered in developing a corpus of Algerian Arabic extracted from computer-mediated communication content, primarily content on the social media platform Facebook and the story-publishing website Wattpad.

American Psychological Association (APA)

Umari, Muhammad& Bu Hania, Bashir. 2022. Challenges in building corpora for Algerian Arabic from CMC content. El-Hakika (The Truth) Journal for Social And Human Sciences،Vol. 21, no. 4, pp.594-617.
https://search.emarefa.net/detail/BIM-1467282

Modern Language Association (MLA)

Umari, Muhammad& Bu Hania, Bashir. Challenges in building corpora for Algerian Arabic from CMC content. El-Hakika (The Truth) Journal for Social And Human Sciences Vol. 21, no. 4 (Dec. 2022), pp.594-617.
https://search.emarefa.net/detail/BIM-1467282

American Medical Association (AMA)

Umari, Muhammad& Bu Hania, Bashir. Challenges in building corpora for Algerian Arabic from CMC content. El-Hakika (The Truth) Journal for Social And Human Sciences. 2022. Vol. 21, no. 4, pp.594-617.
https://search.emarefa.net/detail/BIM-1467282

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 612-617

Record ID

BIM-1467282