Building the Oranian-English parallel corpus: methodology and compilation process

Joint Authors

Daw, Abd al-Basit
Kissi, Khalidah

Source

Journal of Languages and Translation

Issue

Vol. 4, Issue 2 (31 Dec. 2024), pp.161-174, 14 p.

Publisher

Benbouali Hassiba University of Chlef Faculty of Foreign Languages Laboratory of Information and Communication Technologies in the Teaching of Foreign Languages and Translation

Publication Date

2024-12-31

Country of Publication

Algeria

No. of Pages

14

Main Subjects

Languages

Abstract EN

The scarcity of linguistic resources poses a major challenge for automated translation and processing of dialects.

These resources are crucial for natural language processing experts conducting research on dialect recognition, processing, and machine translation.

This paper describes the compilation of a dataset for an Algerian low-resource language as it emphasizes the importance of developing resources for Algerian dialects.

It examines existing relevant corpora and details the creation process and unique features of the pioneering Oranian-English Parallel Corpus (OEPC).

OEPC is the first parallel corpus built from scratch that pairs an Algerian dialect with its English counterparts.

The paper outlines the criteria and steps involved in compiling a monolingual corpus for the Oranian dialect (ORN), including data sources and formats.

ORN comprises 8500 sentences, which were then translated into English to form OEPC.

This valuable linguistic resource is a product of the ERAD project, an initiative aimed at providing NLP professionals with diverse Algerian mono-, multi-, and cross-dialectal corpora.

The paper also explains the data compilation and augmentation techniques used to expand the project's outputs.

American Psychological Association (APA)

Daw, Abd al-Basit& Kissi, Khalidah. 2024. Building the Oranian-English parallel corpus: methodology and compilation process. Journal of Languages and Translation،Vol. 4, no. 2, pp.161-174.
https://search.emarefa.net/detail/BIM-1593587

Modern Language Association (MLA)

Daw, Abd al-Basit& Kissi, Khalidah. Building the Oranian-English parallel corpus: methodology and compilation process. Journal of Languages and Translation Vol. 4, no. 2 (2024), pp.161-174.
https://search.emarefa.net/detail/BIM-1593587

American Medical Association (AMA)

Daw, Abd al-Basit& Kissi, Khalidah. Building the Oranian-English parallel corpus: methodology and compilation process. Journal of Languages and Translation. 2024. Vol. 4, no. 2, pp.161-174.
https://search.emarefa.net/detail/BIM-1593587

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references: p. 171-174

Record ID

BIM-1593587