Perplexity method on the n-gram language model based on Hadoop framework

Joint Authors

Allam, Tahani Mahmud
Abd al-Qadir, Hatim
Salam, al-Sayyid

Source

International Arab Journal of E-Technology

Issue

Vol. 4, Issue 2 (30 Jun. 2015), pp.94-102, 9 p.

Publisher

Arab Open University

Publication Date

2015-06-30

Country of Publication

Jordan

No. of Pages

9

Main Subjects

Information Technology and Computer Science

Topics

Abstract EN

The N-gram language model is used in statistical natural language processing like machine translation and speech recognition.

The evaluation method of the N-gram probability needs a testing process.

We use a distributed computing platform by using MapReduce algorithm and Hbase tables in Hadoop.

Hadoop is an open source implementation of the MapReduce framework.

The comparative query process is dependent on the NoSQL database.

The NoSQL database is used to store the testing data sets in tables with different structures.

The evaluation process uses a MapReduce algorithm on the testing process which acting as a decoder but distributed.

This decoder can process multiple testing texts together.

There are two ways to perform the MapReduce query on testing data.

First one called forward query and the second is hiding query.

We focus on the query response time on a single user runs of three different corpora in the N-gram model.

The perplexity method is a correct way to estimate the performance of the language model.

The perplexity of the testing set is compared with traditional language modeling package SRILM Toolkit.

The result is discussed depending on the choice of the different Hbase tables.

The results demonstrate that the proposed framework provide enhanced performance such less time cost, small memory size.

American Psychological Association (APA)

Allam, Tahani Mahmud& Abd al-Qadir, Hatim& Salam, al-Sayyid. 2015. Perplexity method on the n-gram language model based on Hadoop framework. International Arab Journal of E-Technology،Vol. 4, no. 2, pp.94-102.
https://search.emarefa.net/detail/BIM-647807

Modern Language Association (MLA)

Allam, Tahani Mahmud…[et al.]. Perplexity method on the n-gram language model based on Hadoop framework. International Arab Journal of E-Technology Vol. 4, no. 2 (Jun. 2015), pp.94-102.
https://search.emarefa.net/detail/BIM-647807

American Medical Association (AMA)

Allam, Tahani Mahmud& Abd al-Qadir, Hatim& Salam, al-Sayyid. Perplexity method on the n-gram language model based on Hadoop framework. International Arab Journal of E-Technology. 2015. Vol. 4, no. 2, pp.94-102.
https://search.emarefa.net/detail/BIM-647807

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 101

Record ID

BIM-647807