Tag recommendation for short Arabic text by using latent semantic analysis of Wikipedia

العناوين الأخرى

اقتراح أوسمة للنصوص العربية القصيرة باستخدام تحليل الدلالات الكامنة على الويكيبيديا العربية

مقدم أطروحة جامعية

Abu Samrah, Yusuf Khamis

مشرف أطروحة جامعية

al-Agha, Iyad Muhammad

الجامعة

الجامعة الإسلامية

الكلية

كلية تكنولوجيا المعلومات

القسم الأكاديمي

تكنولوجيا المعلومات

دولة الجامعة

فلسطين (قطاع غزة)

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2017

الملخص الإنجليزي

Social media sites enable users to share items, such as texts and images, and annotate them with freely chosen keywords called tags.

However, freedom comes at a cost: uncontrolled vocabulary can result in tag redundancy, ambiguity, sparsity, missspilling, and idiosyncrasy, thus impeding more effective organization/retrieval of resources in tagging systems.

This work proposes an Arabic Language tag recommender system that exploits the Arabic Wikipedia as background knowledge.

Latent semantic analysis was employed to discover hidden semantics between the short text and Wikipedia articles.

Apache Spark was used to handle the massive content of Wikipedia and the complex computations of latent semantic analysis which is used to analyze Wikipedia articles into three matrices.

Given an Arabic short text as input, the system compares it to the body of the articles and scores them according to their relevance to the short text.

Candidate tags are determined from top-scored articles by exploiting articles' titles and categories.

The proposed system was assessed over a dataset of 100 tweets covering three different domains.

Generated tags were rated by two human experts in each domain.

Our system achieved 84.39% mean average precision and 96.53% mean reciprocal rank, revealing the system adequacy and accuracy for tagging Arabic short texts while still has difficulties regarding Arabic language, and affected by frequencies of rare terms.

A thorough analysis and discussion of the evaluation results are also presented to address the limitations and strengths as well as the recommendations for future improvements.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

عدد الصفحات

72

قائمة المحتويات

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introduction.

Chapter Two : Literature review.

Chapter Three : Methodology.

Chapter Four : Results and discussion

Chapter Five : Conclusion.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Abu Samrah, Yusuf Khamis. (2017). Tag recommendation for short Arabic text by using latent semantic analysis of Wikipedia. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-905921

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Abu Samrah, Yusuf Khamis. Tag recommendation for short Arabic text by using latent semantic analysis of Wikipedia. (Master's theses Theses and Dissertations Master). Islamic University. (2017).
https://search.emarefa.net/detail/BIM-905921

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Abu Samrah, Yusuf Khamis. (2017). Tag recommendation for short Arabic text by using latent semantic analysis of Wikipedia. (Master's theses Theses and Dissertations Master). Islamic University, Palestine (Gaza Strip)
https://search.emarefa.net/detail/BIM-905921

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-905921