COTA 2.0 : an automatic corrector of Tunisian Arabic social media texts

Joint Authors

Makki, Asma
Zribi, Inès
al-Lawzi, Maryam
Balghayth, Lamya Hadrich

Source

Jordanian Journal of Computetrs and Information Technology

Issue

Vol. 8, Issue 4 (31 Dec. 2022), pp.370-387, 18 p.

Publisher

Princess Sumaya University for Technology

Publication Date

2022-12-31

Country of Publication

Jordan

No. of Pages

18

Main Subjects

Information Technology and Computer Science

Abstract EN

In written text, orthographic noise is a common concern for NLP, especially when operating social-network comments and raw documents.

This is mainly due to its orthographic conventions and morphological ambiguity.

We propose to automatically normalize the social-media dialect corpora by following CODA-TA, the conventional Orthography for TA.

The existing system developed for TA «COTA Orthography 1.0» is not able to handle all forms of TA.

Therefore, we propose to extend its rules and lexicons to address the peculiarities of social media dialect.

In certain words, the COTA Orthography 1.0 system provides the user with several correction possibilities.

Therefore, in the new version, we incorporated a trigram language model to automatically select the right correction.

Our results show that the system can reduce transcription errors by 95.72%.

American Psychological Association (APA)

Makki, Asma& Zribi, Inès& al-Lawzi, Maryam& Balghayth, Lamya Hadrich. 2022. COTA 2.0 : an automatic corrector of Tunisian Arabic social media texts. Jordanian Journal of Computetrs and Information Technology،Vol. 8, no. 4, pp.370-387.
https://search.emarefa.net/detail/BIM-1435988

Modern Language Association (MLA)

Makki, Asma…[et al.]. COTA 2.0 : an automatic corrector of Tunisian Arabic social media texts. Jordanian Journal of Computetrs and Information Technology Vol. 8, no. 4 (Dec. 2022), pp.370-387.
https://search.emarefa.net/detail/BIM-1435988

American Medical Association (AMA)

Makki, Asma& Zribi, Inès& al-Lawzi, Maryam& Balghayth, Lamya Hadrich. COTA 2.0 : an automatic corrector of Tunisian Arabic social media texts. Jordanian Journal of Computetrs and Information Technology. 2022. Vol. 8, no. 4, pp.370-387.
https://search.emarefa.net/detail/BIM-1435988

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 385-387

Record ID

BIM-1435988