Automatic term extraction using statistical techniques ; a comparative in-depth study and applications

مقدم أطروحة جامعية

Sabah, Yusuf

مشرف أطروحة جامعية

Abu al-Rubb, Haytham
Abu Zarr, Yusuf

الجامعة

جامعة بيرزيت

الكلية

كلية الهندسة و التكنولوجيا

القسم الأكاديمي

دائرة علم الحاسوب

دولة الجامعة

فلسطين (الضفة الغربية)

الدرجة العلمية

ماجستير

تاريخ الدرجة العلمية

2005

الملخص الإنجليزي

The idea of Information Retrieval (IR) has been generated during the evolutionary change in the way cultural, social or scientific information are stored from inkonpaper to digital libraries distributed on international networks.

Typically this information concerns material such as text, graphic documents (pictures, maps technical drawings etc.), sound and moving images.

In attempt to make such huge amounts of information efficiently retrievable, some techniques for Automatic Term Extraction (ATE) are proposed.

This Automatic procedure is considered the cornerstone of a wide range of applications such as search engines, because manual production of keywords is highly labor intensive.

To ensure precise information retrieval, the extracted keywords should accurately describe the contents of their documents.

To improve this operation, researchers proposed many techniques for Automatic. Term Extraction (ATE) or Automatic Indexing, some used statistical techniques and others used syntactic and probabilistic techniques.

This thesis is a comparative study aimed at leading to the use of statistical techniques including four techniques : Term Frequency (TF), Inverse Document Frequency (IDF) combined Term Frequency-Inverse Document Frequency (TFx IDF) and Term Discrimination Value Model (TDVM).

We have also developed a computational tool for Automatic Term Extraction (ATEWB) to be used in the comparison ; three experiments are used for this purpose to specify the conditions in which each technique is mostly efficient and / or accurate.

On the other hand, this thesis aims at improving statistical techniques efficiency through the utilization of database engines to reduce the computations time of their algorithms.

As well as improving documents retrieval by caching them in the database.

We have tested our model on a collection of abstracts of papers in the field of automatic term extraction, containing keywords composed by their authors in the first experiment, and a collection of documents prepared for test available on some web sites concerned with IR.

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الموضوعات

عدد الصفحات

142

قائمة المحتويات

Table of contents.

Abstract.

Chapter one : Introduction.

Chapter two : Related work.

Chapter three : Automatic term extraction (ATE).

Chapter four : Automatic term extraction workbench (ATEWB) system description.

Chapter five : A comparative study.

Chapter six : Conclusions and future work.

References.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Sabah, Yusuf. (2005). Automatic term extraction using statistical techniques ; a comparative in-depth study and applications. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-303496

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Sabah, Yusuf. Automatic term extraction using statistical techniques ; a comparative in-depth study and applications. (Master's theses Theses and Dissertations Master). Birzeit University. (2005).
https://search.emarefa.net/detail/BIM-303496

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Sabah, Yusuf. (2005). Automatic term extraction using statistical techniques ; a comparative in-depth study and applications. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-303496

لغة النص

الإنجليزية

نوع البيانات

رسائل جامعية

رقم السجل

BIM-303496