Text categorization using polynomial networks

مقدم أطروحة جامعية

al-Tahrawi, Mayy M.

مشرف أطروحة جامعية

Abu Zitar, Raid

أعضاء اللجنة

Khanfar, Khalid
al-Shalabi, Riyad
Hadi, Husayn

الجامعة

الأكاديمية العربية للعلوم المالية و المصرفية

الكلية

كلية نظم و تكنولوجيا المعلومات

دولة الجامعة

الأردن

الدرجة العلمية

دكتوراه

تاريخ الدرجة العلمية

2005

الملخص الإنجليزي

Text Categorization (TC) refers to the task of assigning zero or more predefined categories to an unseen text on the basis of the contents of this text.

Learning how to classify an object into a zero or more pre-specified set of categories is an intelligent task that has drawn a great interest in computer science for years.

In fact, ability of machines to learn how to classify objects is a task of a great aid in many applications.

Decision making, e-mail filtering, web-page classification, and topic spotting are some of the applications that desperately need TC to be automated, due to the huge amount of online information becoming available everyday which needs to be filtered, routed, or searched through as fast as possible.

Traditionally, Polynomial Networks have been difficult to use in TC because of the large datasets used in this field, the high resources needed by these techniques, as well as their low classification accuracy.

Nevertheless, polynomials have many properties that make them very attractive for use as a machine learning approach in TC.

In this thesis, we devise the use of polynomial networks as a supervised machine learning approach for TC.

Direct comparisons of the performance of our proposed polynomial classifier against three other well-known classification algorithms: kNN, NB, and RBF networks on Reuters-21578 and the 20Newsgroups, the two benchmark datasets in the field of TC, and a third dataset of our own creation, show that the polynomial classifiers are able to outperform these well-known high-performing text classifiers.

More importantly, this high performance of the polynomial classifiers is achieved in one-shut (non-iteratively) and using just 0.25-0.50 % of the original corpora features.

Our polynomial classifiers have recorded distinguishable classification performance on rare classes and closely related ones, which usually record low classification accuracies using other classifiers.

A part of this distinguishable performance refers to the new feature reduction techniques, we have devised in this thesis, which cover all classes evenly, in some way, in the reduced feature set used for classification.

These feature reduction techniques have a great effect on enhancing classification performance, especially in the case of the polynomial classifiers.

We faced some difficulties using Polynomial Networks techniques in automated TC.

However, we were able to solve those difficulties and come up with highly accurate polynomial classifiers which are able to outperform well-known, state-of-the-art text classifiers.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

130

قائمة المحتويات

Table of contents.

Abstract.

Chapter one : Introduction.

Chapter two : A review of text categorization.

Chapter three : Well-known supervised machine learning algorithms in text categorization.

Chapter four : Polynomial networks and a proposed application in text categorization.

Chapter five : Testing the proposed approach.

Chapter six : Discussion of results and comparisons with earlier work.

Chapter seven : Conclusions and future directions.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Tahrawi, Mayy M.. (2005). Text categorization using polynomial networks. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-304744

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Tahrawi, Mayy M.. Text categorization using polynomial networks. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences. (2005).
https://search.emarefa.net/detail/BIM-304744

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Tahrawi, Mayy M.. (2005). Text categorization using polynomial networks. (Doctoral dissertations Theses and Dissertations Master). Arab Academy for Financial and Banking Sciences, Jordan
https://search.emarefa.net/detail/BIM-304744

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-304744