Prediction of missing data technique to improve big data classification
مقدم أطروحة جامعية
مشرف أطروحة جامعية
الجامعة
جامعة الإسراء
الكلية
كلية تكنولوجيا المعلومات
القسم الأكاديمي
قسم هندسة البرمجيات
دولة الجامعة
الأردن
الدرجة العلمية
ماجستير
تاريخ الدرجة العلمية
2020
الملخص الإنجليزي
Designing an early prediction systems-based machine learning model (for diabetes disease ( is an emerging research area, increasing day by day due to the increasing of the diabetes cases all around the world.
Missing values in medical datasets in general, and diabetes disease in particular is an issue faces the machine learning models and case studies.
The imputation method is needed for estimating the missing values is a preprocessing step, should be implemented before classifying the cases in the dataset.
In this study, a new imputation algorithm based on Firefly Algorithm (FA) is proposed, which is called Imputation Algorithm based Firefly Algorithm (IFA).
In order to evaluate the proposed IFA algorithm, a classifier is needed as a fitness function, which generates the classification accuracy of the generated dataset and should be maximized.
Therefore, the accuracy is obtained using three different classifiers: K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Naïve Bayesian Classifier (NBC).
Pima Indian Diabetes Disease (PIDD) is the main dataset used in this study for estimating the missing values and evaluate IFA.
The proposed algorithm is evaluated using two types of experiments, first experiments validated the generated datasets using k-fold cross validation (K=5).
While the second experiment the validation is done using holdout validation, where the generated dataset is divided into training set (65%) and testing set (35%).
The obtained results showed that the IFA-SVM was ranked the best based the average of ten run times, while IFA-NBC ranked the worst.
Moreover, IFA with all classifiers had the best accuracies as compared to the four popular techniques, which proved that the optimization algorithm as an imputation algorithm is better than the statistical methods in this study.
In conclusion, FA algorithm can be used for estimating missing values PIDD and medical datasets in general.
التخصصات الرئيسية
تكنولوجيا المعلومات وعلم الحاسوب
الموضوعات
عدد الصفحات
100
قائمة المحتويات
Table of contents.
Abstract.
Chapter One : Introduction.
Chapter Two : Background and related works.
Chapter Three : Proposed algorithm.
Chapter Four : Results analysis.
Chapter Five : Conclusion and future works.
References.
نمط استشهاد جمعية علماء النفس الأمريكية (APA)
Husayn, Huda. (2020). Prediction of missing data technique to improve big data classification. (Master's theses Theses and Dissertations Master). Isra University, Jordan
https://search.emarefa.net/detail/BIM-985125
نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)
Husayn, Huda. Prediction of missing data technique to improve big data classification. (Master's theses Theses and Dissertations Master). Isra University. (2020).
https://search.emarefa.net/detail/BIM-985125
نمط استشهاد الجمعية الطبية الأمريكية (AMA)
Husayn, Huda. (2020). Prediction of missing data technique to improve big data classification. (Master's theses Theses and Dissertations Master). Isra University, Jordan
https://search.emarefa.net/detail/BIM-985125
لغة النص
الإنجليزية
نوع البيانات
رسائل جامعية
رقم السجل
BIM-985125
قاعدة معامل التأثير والاستشهادات المرجعية العربي "ارسيف Arcif"
أضخم قاعدة بيانات عربية للاستشهادات المرجعية للمجلات العلمية المحكمة الصادرة في العالم العربي
تقوم هذه الخدمة بالتحقق من التشابه أو الانتحال في الأبحاث والمقالات العلمية والأطروحات الجامعية والكتب والأبحاث باللغة العربية، وتحديد درجة التشابه أو أصالة الأعمال البحثية وحماية ملكيتها الفكرية. تعرف اكثر