Issues of dialectal Saudi Twitter corpus
Author
Source
The International Arab Journal of Information Technology
Issue
Vol. 17, Issue 3 (31 May. 2020), pp.367-374, 8 p.
Publisher
Zarqa University Deanship of Scientific Research
Publication Date
2020-05-31
Country of Publication
Jordan
No. of Pages
8
Main Subjects
Information Technology and Computer Science
Abstract EN
Text mining research relies heavily on the availability of a suitable corpus.
This paper presents a dialectal Saudi corpus that contains 207452 tweets generated by Saudi Twitter users.
In addition, a comparison between the Saudi tweets dataset, Egyptian Twitter corpus and Arabic top news raw corpus (representing Modern Standard Arabic (MSA) in various aspects, such as the differences between formal and colloquial texts was carried out.
Moreover, investigation into the issues and phenomena, such as shortening, concatenation, colloquial language, compounding, foreign language, spelling errors and neologisms on this type of dataset was performed.
American Psychological Association (APA)
al-Ruwayli, Mushrif. 2020. Issues of dialectal Saudi Twitter corpus. The International Arab Journal of Information Technology،Vol. 17, no. 3, pp.367-374.
https://search.emarefa.net/detail/BIM-962349
Modern Language Association (MLA)
al-Ruwayli, Mushrif. Issues of dialectal Saudi Twitter corpus. The International Arab Journal of Information Technology Vol. 17, no. 3 (May. 2020), pp.367-374.
https://search.emarefa.net/detail/BIM-962349
American Medical Association (AMA)
al-Ruwayli, Mushrif. Issues of dialectal Saudi Twitter corpus. The International Arab Journal of Information Technology. 2020. Vol. 17, no. 3, pp.367-374.
https://search.emarefa.net/detail/BIM-962349
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references : p. 373-374
Record ID
BIM-962349