Arabic keyword extraction using artificial neural networks

Other Title(s)

استخراج الكلمات المفتاحية من النص العربي باستخدام الشبكات العصبية الاصطناعية

Dissertant

al-Amush, Ibtihal H.

Thesis advisor

Samawi, Venus W.

Comitee Members

Shatnawi, Umar Ali
al-Nihoud, Jihad Quball Awdah
al-Hajj, Ali Muhammad Muhammad

University

Al albayt University

Faculty

Prince Hussein Bin Abdullah Faculty for Information Technology

Department

Department of Computer Science

University Country

Jordan

Degree

Master

Degree Date

2012

English Abstract

The main objective of this work concerns with keyword extraction.

The proposed work presents a technique to extract keywords from Arabic single text document using statistical features.

Kohonen Artificial Neural Networks (ANN) approach is used to cluster keywords.

The proposed model consists of three main stages: Document Preprocessing stage: five linguistic operations are implemented, these are: Removing non Arabic letters, Lexical analysis of the text (eliminating punctuation marks, digits, and the special symbols), remove stop-words, Perform light stemming, and excluding words that have length less than three letters.

The second stage Generates statistical features vector for each word.

The proposed system based on the analyses of some term occurrence characteristics such as the Term Frequency (TF), if the word in the First Sentence (FS) in the text, if the word in the Last Sentence (LS) of the text, if the word appears in the document Title (T), and the spread of that word over the document according to measure of Sentence Frequency (SF).

In this work, we also studied the effect of using Normalized Term Frequency (NTF) and Ratio of Sentence Frequency (RSF) on the clustering accuracy and the absent and present of each feature on the result of our proposed system to specify the best feature set.

Finally, construct SOM (Khonen neural network) to cluster keywords, where the number of nodes in the input layer will depend on number of features in feature vector, the output node(s) in the output layer will be two nodes (keyword, or non-keyword).

The winner node (keyword) that have highest weight.

The proposed model performance is evaluated using recall, precision, and F-measure.

The adopted Khonen neural network is applied on 48 documents (24 documents selected from Jordan Journal of Social Sciences (JJSS), and 24 documents selected from the Arabic Wikipedia dataset).

The result of each experiment is then compared with the actual keywords associated with each document (for Wikipedia dataset, meta-tag is considered as keyword; for JJSS dataset, keywords are associated with each document).

The system performance has been compared with Sakhr keyword extractor.

By comparing the performance of the suggested system with Sakhr system, in general, the proposed system showed comparable performance.

To specify the best feature set, 12 different combinations of statistical features are considered.

As a result of experiments, the best average of recalls was for feature set < T, TF, SF, FS and LS > where it was 52.63 %.

The best average of precisions was when feature set is used, where on average the precision = 42.84 %.

Finally, the best F-measure on average is achieved when alone is used.

Main Subjects

Information Technology and Computer Science

Topics

No. of Pages

56

Table of Contents

Table of contents.

Abstract.

Chapter One : overview.

Chapter Two : literature survey.

Chapter Three : theoretical background.

Chapter Four : development of the suggested system.

Chapter Five : experimentation and results analysis.

Chapter Six : conclusion and future work.

References.

American Psychological Association (APA)

al-Amush, Ibtihal H.. (2012). Arabic keyword extraction using artificial neural networks. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321374

Modern Language Association (MLA)

al-Amush, Ibtihal H.. Arabic keyword extraction using artificial neural networks. (Master's theses Theses and Dissertations Master). Al albayt University. (2012).
https://search.emarefa.net/detail/BIM-321374

American Medical Association (AMA)

al-Amush, Ibtihal H.. (2012). Arabic keyword extraction using artificial neural networks. (Master's theses Theses and Dissertations Master). Al albayt University, Jordan
https://search.emarefa.net/detail/BIM-321374

Language

English

Data Type

Arab Theses

Record ID

BIM-321374