Google N-gram viewer does not include Arabic corpus ! towards N-gram viewer for Arabic corpus

المؤلفون المشاركون

Zarur, Muhammad
al-Smadi, Izzat

المصدر

The International Arab Journal of Information Technology

العدد

المجلد 15، العدد 5 (30 سبتمبر/أيلول 2018)، ص ص. 785-794، 10ص.

الناشر

جامعة الزرقاء

تاريخ النشر

2018-09-30

دولة النشر

الأردن

عدد الصفحات

10

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الملخص EN

Google N-gram viewer is one of those newly published Google services.

Google archived or digitized a large number of books in different languages.

Google populated the corpora from over 5 million books published up to 2008.

This Google service allows users to enter queries of words.

The tool then charts time-based data that show the frequency of usage of query words.

Although Arabic is one of the top spoken language in the world, Arabic language is not included as one of the corpora indexed by the Google n-gram viewer.

This research work discusses the development of large Arabic corpus and indexing it using N-grams to be included in Google N-gram viewer.

A showcase is presented to build a dataset to initiate the process of digitizing the Arabic content and prepare it to be incorporated in Google N-gram viewer.

One of the major goals of including Arabic content in Google N-gram is to enrich Arabic public content, which has been very limited in comparison with the number of people who speak Arabic.

We believe that adopting Arabic language by Google N-gram viewer can significantly benefit researchers in different fields related to Arabic language and social sciences

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Smadi, Izzat& Zarur, Muhammad. 2018. Google N-gram viewer does not include Arabic corpus ! towards N-gram viewer for Arabic corpus. The International Arab Journal of Information Technology،Vol. 15, no. 5, pp.785-794.
https://search.emarefa.net/detail/BIM-839140

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Smadi, Izzat& Zarur, Muhammad. Google N-gram viewer does not include Arabic corpus ! towards N-gram viewer for Arabic corpus. The International Arab Journal of Information Technology Vol. 15, no. 5 (Sep. 2018), pp.785-794.
https://search.emarefa.net/detail/BIM-839140

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Smadi, Izzat& Zarur, Muhammad. Google N-gram viewer does not include Arabic corpus ! towards N-gram viewer for Arabic corpus. The International Arab Journal of Information Technology. 2018. Vol. 15, no. 5, pp.785-794.
https://search.emarefa.net/detail/BIM-839140

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references : p. 793-794

رقم السجل

BIM-839140