Fast text analysis using symbol enumeration and hashing methodology

Other Title(s)

التحليل السريع للبيانات باستخدام طرق التجزئة و ترقيم الرموز

Joint Authors

Abd al-Jabbar, Safa Sami
George, Edward Luayy

Source

Iraqi Journal of Science

Issue

Vol. 58, Issue 1B (31 Mar. 2017), pp.345-354, 10 p.

Publisher

University of Baghdad College of Science

Publication Date

2017-03-31

Country of Publication

Iraq

No. of Pages

10

Abstract EN

This paper is focusing on reducing the time for text processing operations by taking the advantage of enumerating each string using the multi hashing methodology.

Text analysis is an important subject for any system that deals with strings (sequences of characters from an alphabet) and text processing (e.g., word-processor, text editor and other text manipulation systems).

Many problems have been arisen when dealing with string operations which consist of an unfixed number of characters (e.g., the execution time); this due to the overhead embedded-operations (like, symbols matching and conversion operations).

The execution time largely depends on the string characteristics; especially its length (i.e., the number of characters consisting the strings plus the number of words in the sentence).

In other words, the variable length of strings is an obstacle to achieve processing uniformity when manipulating strings.

Many of string matching algorithms were introduced in the literature to deal with fixed length of characters of each string.

In this paper, some test results are provided for a number of string operations (such as, simple string matching, hashing indexing systems, stop-words collection and text extractions).

To understand the advantage of the proposed method, these operations were applied on different sizes of text files.

A comparison is made with the results of using traditional methods that deal with strings only.

The overall results demonstrate the positive effectiveness of the proposed approach.

American Psychological Association (APA)

Abd al-Jabbar, Safa Sami& George, Edward Luayy. 2017. Fast text analysis using symbol enumeration and hashing methodology. Iraqi Journal of Science،Vol. 58, no. 1B, pp.345-354.
https://search.emarefa.net/detail/BIM-732193

Modern Language Association (MLA)

Abd al-Jabbar, Safa Sami& George, Edward Luayy. Fast text analysis using symbol enumeration and hashing methodology. Iraqi Journal of Science Vol. 58, no. 1B (2017), pp.345-354.
https://search.emarefa.net/detail/BIM-732193

American Medical Association (AMA)

Abd al-Jabbar, Safa Sami& George, Edward Luayy. Fast text analysis using symbol enumeration and hashing methodology. Iraqi Journal of Science. 2017. Vol. 58, no. 1B, pp.345-354.
https://search.emarefa.net/detail/BIM-732193

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references : p. 353-354

Record ID

BIM-732193