Arabic keyword extraction using artificial neural networks

العناوين الأخرى

استخراج الكلمات المفتاحية من النص العربي باستخدام الشبكات العصبية الاصطناعية

مقدم أطروحة جامعية

al-Amush, Ibtihal H.

مشرف أطروحة جامعية

Samawi, Venus W.

أعضاء اللجنة

Shatnawi, Umar Ali
al-Nihoud, Jihad Quball Awdah
al-Hajj, Ali Muhammad Muhammad

الجامعة

جامعة آل البيت

الكلية

كلية الأمير الحسين بن عبد الله لتكنولوجيا المعلومات

القسم الأكاديمي

قسم علوم الحاسوب

دولة الجامعة

الأردن

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2012

الملخص الإنجليزي

The main objective of this work concerns with keyword extraction.

The proposed work presents a technique to extract keywords from Arabic single text document using statistical features.

Kohonen Artificial Neural Networks (ANN) approach is used to cluster keywords.

The proposed model consists of three main stages: Document Preprocessing stage: five linguistic operations are implemented, these are: Removing non Arabic letters, Lexical analysis of the text (eliminating punctuation marks, digits, and the special symbols), remove stop-words, Perform light stemming, and excluding words that have length less than three letters.

The second stage Generates statistical features vector for each word.

The proposed system based on the analyses of some term occurrence characteristics such as the Term Frequency (TF), if the word in the First Sentence (FS) in the text, if the word in the Last Sentence (LS) of the text, if the word appears in the document Title (T), and the spread of that word over the document according to measure of Sentence Frequency (SF).

In this work, we also studied the effect of using Normalized Term Frequency (NTF) and Ratio of Sentence Frequency (RSF) on the clustering accuracy and the absent and present of each feature on the result of our proposed system to specify the best feature set.

Finally, construct SOM (Khonen neural network) to cluster keywords, where the number of nodes in the input layer will depend on number of features in feature vector, the output node(s) in the output layer will be two nodes (keyword, or non-keyword).

The winner node (keyword) that have highest weight.

The proposed model performance is evaluated using recall, precision, and F-measure.

The adopted Khonen neural network is applied on 48 documents (24 documents selected from Jordan Journal of Social Sciences (JJSS), and 24 documents selected from the Arabic Wikipedia dataset).

The result of each experiment is then compared with the actual keywords associated with each document (for Wikipedia dataset, meta-tag is considered as keyword; for JJSS dataset, keywords are associated with each document).

The system performance has been compared with Sakhr keyword extractor.

By comparing the performance of the suggested system with Sakhr system, in general, the proposed system showed comparable performance.

To specify the best feature set, 12 different combinations of statistical features are considered.

As a result of experiments, the best average of recalls was for feature set < T, TF, SF, FS and LS > where it was 52.63 %.

The best average of precisions was when feature set is used, where on average the precision = 42.84 %.

Finally, the best F-measure on average is achieved when alone is used.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

56

قائمة المحتويات

Table of contents.

Abstract.

Chapter One : overview.

Chapter Two : literature survey.

Chapter Three : theoretical background.

Chapter Four : development of the suggested system.

Chapter Five : experimentation and results analysis.

Chapter Six : conclusion and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

al-Amush, Ibtihal H.. (2012). Arabic keyword extraction using artificial neural networks. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321374

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

al-Amush, Ibtihal H.. Arabic keyword extraction using artificial neural networks. (Master's theses Theses and Dissertations Master). Al albayt University. (2012).
https://search.emarefa.net/detail/BIM-321374

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

al-Amush, Ibtihal H.. (2012). Arabic keyword extraction using artificial neural networks. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321374

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-321374