New string matching algorithms utilizing horspool, boyeer-more,FLC-RJ algorithms and markov model

مقدم أطروحة جامعية

al-Shudayfat, Mustafa I.

مشرف أطروحة جامعية

al-Nihoud, Jihad Quball Awdah

أعضاء اللجنة

Batiha, Khalid
al-Adwan, Sahar

الجامعة

جامعة آل البيت

الكلية

كلية الأمير الحسين بن عبد الله لتكنولوجيا المعلومات

القسم الأكاديمي

قسم علوم الحاسوب

دولة الجامعة

الأردن

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2013

الملخص الإنجليزي

This thesis introduces new string matching algorithms to search for a pattern in a text by utilizing Horspool, Boyeer-More, First Last Character-Rami and Jehad and Markov model.

The string matching problem is defined by finding a pattern (P) in text (T) and determine their positions.

The known string matching algorithms are divided into two categories: (1) exact string matching, which must find the exact word in the text, and (2) approximate string matching which in turn will find a word close to the pattern. This study adds the weight of each character to search, and adds the weight of each 3 characters.

The algorithms depend on character frequency of appearing and substring (3 characters) frequency of appearing.

The enhancement when compared with the Boyer Moore algorithm for the three algorithms (Least occurring characters in sequence, Least occurring characters –First last character, Least occurring character-Least occurring substring) is as follows: Least occurring characters in sequence enhancement is between 38.92% up to 65.43%, Least occurring characters –First last character enhancement is between 36.07% up to 55.52%.

Least occurring character-Least occurring substring enhancement is between 71.94% up to 80.01%, The results may vary from one word to another according to their characters and their probability of appearing.

In general however, when the word's characters increase the probability of finding one of those least occurring character such as Z or V increases. The Least occurring character-Least occurring substring algorithm was the superior to all other algorithm then Least occurring characters in sequence and finally Least occurring characters –First last character algorithm when compared to Boyer-Moore algorithm. Thesis data was taken from articles of computer science; more than 1,100,660 characters were analyzed and saved to the hashtable.

Markov’s mathematical model is used to predict the next upcoming character and substring.

This research implements 3 algorithms least occurring characters in sequence algorithm, the least occurring character then first last character algorithm and the least occurring character the least occurring substring algorithm and it is primary used English language properties. The MJS-Tool (Mustafa Jeahd Searching – Tool) tested the algorithms on a variety of articles and patterns, patterns from 4-22.

More than (800) runs, the results were saved in text file.

Results show that least occurring characters in sequence algorithm, the least occurring character then first last character algorithm and the least occurring character the least occurring substring algorithm are superior in performance in comparison with the Boyer-Moore algorithm, and the best performance among all algorithms is the least occurring character the least occurring substring algorithm according to the experimental tests and more than 800 runs.

التخصصات الرئيسية

الرياضيات

الموضوعات

عدد الصفحات

67

قائمة المحتويات

Table of contents.

Abstract.

Chapter one : Introduction.

Chapter two : Related work.

Chapter three : Proposed algorithms.

Chapter four : The MJS-tool, Mustafa and Jehad searching tool.

Chapter five : Simulation and discussion.

Chapter six : Conclusion and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Shudayfat, Mustafa I.. (2013). New string matching algorithms utilizing horspool, boyeer-more,FLC-RJ algorithms and markov model. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-415769

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Shudayfat, Mustafa I.. New string matching algorithms utilizing horspool, boyeer-more,FLC-RJ algorithms and markov model. (Master's theses Theses and Dissertations Master). Al albayt University. (2013).
https://search.emarefa.net/detail/BIM-415769

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Shudayfat, Mustafa I.. (2013). New string matching algorithms utilizing horspool, boyeer-more,FLC-RJ algorithms and markov model. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-415769

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-415769