Hybridized dimensionality reduction method for machine learning based web pages classification
المؤلف
المصدر
Iraqi Journal of Computer, Communications and Control Engineering
العدد
المجلد 22، العدد 3 (30 سبتمبر/أيلول 2022)، ص ص. 97-110، 14ص.
الناشر
تاريخ النشر
2022-09-30
دولة النشر
العراق
عدد الصفحات
14
التخصصات الرئيسية
تكنولوجيا المعلومات وعلم الحاسوب
الملخص EN
Feature space high dimensionality is a well-known problem in text classification and web mining domains, it is caused mainly by the large number of vocabularies contained within web documents.
Several methods were applied to select the most useful and important features over the years; however, the performance of such methods is still improvable from different aspects such as the computational cost and accuracy.
This research presents an enhanced cosine similarity-based hybridization of two efficient feature selection methods for higher classification performance.
The reduced feature sets are generated using the Random Projection (RP) and the Principal Component Analysis (PCA) methods, individually, then hybridized based on the cosine similarity values between features’ vectors.
The performance of the proposed method in terms of accuracy and F-measure was tested on a dataset of web pages based on several term weighting schemes.
As compared to relevant methods, results of the proposed method show significantly higher accuracy and f-measure performance based on less feature set size
نمط استشهاد جمعية علماء النفس الأمريكية (APA)
Sabah, Thabit Sulayman. 2022. Hybridized dimensionality reduction method for machine learning based web pages classification. Iraqi Journal of Computer, Communications and Control Engineering،Vol. 22, no. 3, pp.97-110.
https://search.emarefa.net/detail/BIM-1492789
نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)
Sabah, Thabit Sulayman. Hybridized dimensionality reduction method for machine learning based web pages classification. Iraqi Journal of Computer, Communications and Control Engineering Vol. 22, no. 3 (Sep. 2022), pp.97-110.
https://search.emarefa.net/detail/BIM-1492789
نمط استشهاد الجمعية الطبية الأمريكية (AMA)
Sabah, Thabit Sulayman. Hybridized dimensionality reduction method for machine learning based web pages classification. Iraqi Journal of Computer, Communications and Control Engineering. 2022. Vol. 22, no. 3, pp.97-110.
https://search.emarefa.net/detail/BIM-1492789
نوع البيانات
مقالات
لغة النص
الإنجليزية
الملاحظات
Includes bibliographical references : p. 108-109
رقم السجل
BIM-1492789
قاعدة معامل التأثير والاستشهادات المرجعية العربي "ارسيف Arcif"
أضخم قاعدة بيانات عربية للاستشهادات المرجعية للمجلات العلمية المحكمة الصادرة في العالم العربي
تقوم هذه الخدمة بالتحقق من التشابه أو الانتحال في الأبحاث والمقالات العلمية والأطروحات الجامعية والكتب والأبحاث باللغة العربية، وتحديد درجة التشابه أو أصالة الأعمال البحثية وحماية ملكيتها الفكرية. تعرف اكثر