Particle swarm based feature selection for improving random forest classification accuracy
مقدم أطروحة جامعية
al-Wahidi, Bara Khalid Abd al-Razzaq
مشرف أطروحة جامعية
al-Rusan, Thamir
Abu Hamidah, Mahir Abd al-Qadir Abd al-Munim
الجامعة
جامعة الإسراء
الكلية
كلية تكنولوجيا المعلومات
القسم الأكاديمي
قسم هندسة البرمجيات
دولة الجامعة
الأردن
الدرجة العلمية
ماجستير
تاريخ الدرجة العلمية
2019
الملخص الإنجليزي
Iterating over every possible combination of features and building each combination as a decision tree takes massive processing power especially when there aremany features to select from.
The main drawback with using decision tree classifiers is the tendency of the tree to be over fitted to a specific scenario.
The random forest classifier resolves this issue by using randomly selected features as nodes.
The problem with this approach is that it requires more time and computational power to construct the trees.
Researchers have identified this issue and worked on multiple variations of random forest to reduce the number of decision trees to be grown.
Some of the successful variations use Symmetrical Uncertainty, and other methods to select a feature combination that will yield the highest accuracy achieving trees and generate a random forest for these features rather than the entire dataset.
Others have employed the genetic algorithm in accordance with random forests to optimize the order and appearance of the features in making the random forest.
In this research we employed an optimization algorithm called Binary Particle Swarm.
The binary particle swarm optimization algorithm is a powerful algorithm in the field of optimization.
We used this algorithm to pick the best features that represent a dataset as input for a random forest classifier.
We have achieved impeccable results in terms of accuracy and precision while maintaining minimum user interaction.
We used the Wisconsin breast cancer dataset which can be obtained from the UCI machine learning repository.
The objective in this dataset is to predict whether the patient has a benign or malignant tumor based on the attributes provided.
The other dataset we used was the Titanic disaster dataset which can also be obtained from the UCI machine learning repository.
In this dataset, the objective is to predict whether the passenger has survived or not based on the provided attributes.
We obtained a 97% on average and a best 98% classification accuracy on the Wisconsin breast cancer dataset.
Using the same technique, we obtained 97% classification accuracy on the Titanic datase
التخصصات الرئيسية
تكنولوجيا المعلومات وعلم الحاسوب
عدد الصفحات
54
قائمة المحتويات
Table of contents.
Abstract.
Chapter One : Introduction.
Chapter Two : Literature review.
Chapter Three : Related work.
Chapter Four : Proposed approach.
Chapter Five : Implementation.
Chapter Six : Results.
Chapter Seven : Validation.
References.
نمط استشهاد جمعية علماء النفس الأمريكية (APA)
al-Wahidi, Bara Khalid Abd al-Razzaq. (2019). Particle swarm based feature selection for improving random forest classification accuracy. (Master's theses Theses and Dissertations Master). Isra University, Jordan
https://search.emarefa.net/detail/BIM-988682
نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)
al-Wahidi, Bara Khalid Abd al-Razzaq. Particle swarm based feature selection for improving random forest classification accuracy. (Master's theses Theses and Dissertations Master). Isra University. (2019).
https://search.emarefa.net/detail/BIM-988682
نمط استشهاد الجمعية الطبية الأمريكية (AMA)
al-Wahidi, Bara Khalid Abd al-Razzaq. (2019). Particle swarm based feature selection for improving random forest classification accuracy. (Master's theses Theses and Dissertations Master). Isra University, Jordan
https://search.emarefa.net/detail/BIM-988682
لغة النص
الإنجليزية
نوع البيانات
رسائل جامعية
رقم السجل
BIM-988682
قاعدة معامل التأثير والاستشهادات المرجعية العربي "ارسيف Arcif"
أضخم قاعدة بيانات عربية للاستشهادات المرجعية للمجلات العلمية المحكمة الصادرة في العالم العربي
تقوم هذه الخدمة بالتحقق من التشابه أو الانتحال في الأبحاث والمقالات العلمية والأطروحات الجامعية والكتب والأبحاث باللغة العربية، وتحديد درجة التشابه أو أصالة الأعمال البحثية وحماية ملكيتها الفكرية. تعرف اكثر