Perplexity method on the n-gram language model based on Hadoop framework

المؤلفون المشاركون

Allam, Tahani Mahmud
Abd al-Qadir, Hatim
Salam, al-Sayyid

المصدر

International Arab Journal of E-Technology

العدد

المجلد 4، العدد 2 (30 يونيو/حزيران 2015)، ص ص. 94-102، 9ص.

الناشر

الجامعة العربية المفتوحة

تاريخ النشر

2015-06-30

دولة النشر

الأردن

عدد الصفحات

9

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

الملخص EN

The N-gram language model is used in statistical natural language processing like machine translation and speech recognition.

The evaluation method of the N-gram probability needs a testing process.

We use a distributed computing platform by using MapReduce algorithm and Hbase tables in Hadoop.

Hadoop is an open source implementation of the MapReduce framework.

The comparative query process is dependent on the NoSQL database.

The NoSQL database is used to store the testing data sets in tables with different structures.

The evaluation process uses a MapReduce algorithm on the testing process which acting as a decoder but distributed.

This decoder can process multiple testing texts together.

There are two ways to perform the MapReduce query on testing data.

First one called forward query and the second is hiding query.

We focus on the query response time on a single user runs of three different corpora in the N-gram model.

The perplexity method is a correct way to estimate the performance of the language model.

The perplexity of the testing set is compared with traditional language modeling package SRILM Toolkit.

The result is discussed depending on the choice of the different Hbase tables.

The results demonstrate that the proposed framework provide enhanced performance such less time cost, small memory size.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Allam, Tahani Mahmud& Abd al-Qadir, Hatim& Salam, al-Sayyid. 2015. Perplexity method on the n-gram language model based on Hadoop framework. International Arab Journal of E-Technology،Vol. 4, no. 2, pp.94-102.
https://search.emarefa.net/detail/BIM-647807

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Allam, Tahani Mahmud…[et al.]. Perplexity method on the n-gram language model based on Hadoop framework. International Arab Journal of E-Technology Vol. 4, no. 2 (Jun. 2015), pp.94-102.
https://search.emarefa.net/detail/BIM-647807

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Allam, Tahani Mahmud& Abd al-Qadir, Hatim& Salam, al-Sayyid. Perplexity method on the n-gram language model based on Hadoop framework. International Arab Journal of E-Technology. 2015. Vol. 4, no. 2, pp.94-102.
https://search.emarefa.net/detail/BIM-647807

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references : p. 101

رقم السجل

BIM-647807