New string matching algorithms utilizing horspool, boyeer-more,FLC-RJ algorithms and markov model

Dissertant

al-Shudayfat, Mustafa I.

Thesis advisor

al-Nihoud, Jihad Quball Awdah

Comitee Members

Batiha, Khalid
al-Adwan, Sahar

University

Al albayt University

Faculty

Prince Hussein Bin Abdullah Faculty for Information Technology

Department

Department of Computer Science

University Country

Jordan

Degree

Master

Degree Date

2013

English Abstract

This thesis introduces new string matching algorithms to search for a pattern in a text by utilizing Horspool, Boyeer-More, First Last Character-Rami and Jehad and Markov model.

The string matching problem is defined by finding a pattern (P) in text (T) and determine their positions.

The known string matching algorithms are divided into two categories: (1) exact string matching, which must find the exact word in the text, and (2) approximate string matching which in turn will find a word close to the pattern. This study adds the weight of each character to search, and adds the weight of each 3 characters.

The algorithms depend on character frequency of appearing and substring (3 characters) frequency of appearing.

The enhancement when compared with the Boyer Moore algorithm for the three algorithms (Least occurring characters in sequence, Least occurring characters –First last character, Least occurring character-Least occurring substring) is as follows: Least occurring characters in sequence enhancement is between 38.92% up to 65.43%, Least occurring characters –First last character enhancement is between 36.07% up to 55.52%.

Least occurring character-Least occurring substring enhancement is between 71.94% up to 80.01%, The results may vary from one word to another according to their characters and their probability of appearing.

In general however, when the word's characters increase the probability of finding one of those least occurring character such as Z or V increases. The Least occurring character-Least occurring substring algorithm was the superior to all other algorithm then Least occurring characters in sequence and finally Least occurring characters –First last character algorithm when compared to Boyer-Moore algorithm. Thesis data was taken from articles of computer science; more than 1,100,660 characters were analyzed and saved to the hashtable.

Markov’s mathematical model is used to predict the next upcoming character and substring.

This research implements 3 algorithms least occurring characters in sequence algorithm, the least occurring character then first last character algorithm and the least occurring character the least occurring substring algorithm and it is primary used English language properties. The MJS-Tool (Mustafa Jeahd Searching – Tool) tested the algorithms on a variety of articles and patterns, patterns from 4-22.

More than (800) runs, the results were saved in text file.

Results show that least occurring characters in sequence algorithm, the least occurring character then first last character algorithm and the least occurring character the least occurring substring algorithm are superior in performance in comparison with the Boyer-Moore algorithm, and the best performance among all algorithms is the least occurring character the least occurring substring algorithm according to the experimental tests and more than 800 runs.

Main Subjects

Mathematics

Topics

No. of Pages

67

Table of Contents

Table of contents.

Abstract.

Chapter one : Introduction.

Chapter two : Related work.

Chapter three : Proposed algorithms.

Chapter four : The MJS-tool, Mustafa and Jehad searching tool.

Chapter five : Simulation and discussion.

Chapter six : Conclusion and future work.

References.

American Psychological Association (APA)

al-Shudayfat, Mustafa I.. (2013). New string matching algorithms utilizing horspool, boyeer-more,FLC-RJ algorithms and markov model. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-415769

Modern Language Association (MLA)

al-Shudayfat, Mustafa I.. New string matching algorithms utilizing horspool, boyeer-more,FLC-RJ algorithms and markov model. (Master's theses Theses and Dissertations Master). Al albayt University. (2013).
https://search.emarefa.net/detail/BIM-415769

American Medical Association (AMA)

al-Shudayfat, Mustafa I.. (2013). New string matching algorithms utilizing horspool, boyeer-more,FLC-RJ algorithms and markov model. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-415769

Language

English

Data Type

Arab Theses

Record ID

BIM-415769