Perplexity method on the n-gram language model based on Hadoop framework
Joint Authors
Allam, Tahani Mahmud
Abd al-Qadir, Hatim
Salam, al-Sayyid
Source
International Arab Journal of E-Technology
Issue
Vol. 4, Issue 2 (30 Jun. 2015), pp.94-102, 9 p.
Publisher
Publication Date
2015-06-30
Country of Publication
Jordan
No. of Pages
9
Main Subjects
Information Technology and Computer Science
Topics
Abstract EN
The N-gram language model is used in statistical natural language processing like machine translation and speech recognition.
The evaluation method of the N-gram probability needs a testing process.
We use a distributed computing platform by using MapReduce algorithm and Hbase tables in Hadoop.
Hadoop is an open source implementation of the MapReduce framework.
The comparative query process is dependent on the NoSQL database.
The NoSQL database is used to store the testing data sets in tables with different structures.
The evaluation process uses a MapReduce algorithm on the testing process which acting as a decoder but distributed.
This decoder can process multiple testing texts together.
There are two ways to perform the MapReduce query on testing data.
First one called forward query and the second is hiding query.
We focus on the query response time on a single user runs of three different corpora in the N-gram model.
The perplexity method is a correct way to estimate the performance of the language model.
The perplexity of the testing set is compared with traditional language modeling package SRILM Toolkit.
The result is discussed depending on the choice of the different Hbase tables.
The results demonstrate that the proposed framework provide enhanced performance such less time cost, small memory size.
American Psychological Association (APA)
Allam, Tahani Mahmud& Abd al-Qadir, Hatim& Salam, al-Sayyid. 2015. Perplexity method on the n-gram language model based on Hadoop framework. International Arab Journal of E-Technology،Vol. 4, no. 2, pp.94-102.
https://search.emarefa.net/detail/BIM-647807
Modern Language Association (MLA)
Allam, Tahani Mahmud…[et al.]. Perplexity method on the n-gram language model based on Hadoop framework. International Arab Journal of E-Technology Vol. 4, no. 2 (Jun. 2015), pp.94-102.
https://search.emarefa.net/detail/BIM-647807
American Medical Association (AMA)
Allam, Tahani Mahmud& Abd al-Qadir, Hatim& Salam, al-Sayyid. Perplexity method on the n-gram language model based on Hadoop framework. International Arab Journal of E-Technology. 2015. Vol. 4, no. 2, pp.94-102.
https://search.emarefa.net/detail/BIM-647807
Data Type
Journal Articles
Language
English
Notes
Includes bibliographical references : p. 101
Record ID
BIM-647807