Automatic term extraction using statistical techniques ; a comparative in-depth study and applications
مقدم أطروحة جامعية
مشرف أطروحة جامعية
Abu al-Rubb, Haytham
Abu Zarr, Yusuf
الجامعة
جامعة بيرزيت
الكلية
كلية الهندسة و التكنولوجيا
القسم الأكاديمي
دائرة علم الحاسوب
دولة الجامعة
فلسطين (الضفة الغربية)
الدرجة العلمية
ماجستير
تاريخ الدرجة العلمية
2005
الملخص الإنجليزي
The idea of Information Retrieval (IR) has been generated during the evolutionary change in the way cultural, social or scientific information are stored from inkonpaper to digital libraries distributed on international networks.
Typically this information concerns material such as text, graphic documents (pictures, maps technical drawings etc.), sound and moving images.
In attempt to make such huge amounts of information efficiently retrievable, some techniques for Automatic Term Extraction (ATE) are proposed.
This Automatic procedure is considered the cornerstone of a wide range of applications such as search engines, because manual production of keywords is highly labor intensive.
To ensure precise information retrieval, the extracted keywords should accurately describe the contents of their documents.
To improve this operation, researchers proposed many techniques for Automatic. Term Extraction (ATE) or Automatic Indexing, some used statistical techniques and others used syntactic and probabilistic techniques.
This thesis is a comparative study aimed at leading to the use of statistical techniques including four techniques : Term Frequency (TF), Inverse Document Frequency (IDF) combined Term Frequency-Inverse Document Frequency (TFx IDF) and Term Discrimination Value Model (TDVM).
We have also developed a computational tool for Automatic Term Extraction (ATEWB) to be used in the comparison ; three experiments are used for this purpose to specify the conditions in which each technique is mostly efficient and / or accurate.
On the other hand, this thesis aims at improving statistical techniques efficiency through the utilization of database engines to reduce the computations time of their algorithms.
As well as improving documents retrieval by caching them in the database.
We have tested our model on a collection of abstracts of papers in the field of automatic term extraction, containing keywords composed by their authors in the first experiment, and a collection of documents prepared for test available on some web sites concerned with IR.
التخصصات الرئيسية
تكنولوجيا المعلومات وعلم الحاسوب
الموضوعات
عدد الصفحات
142
قائمة المحتويات
Table of contents.
Abstract.
Chapter one : Introduction.
Chapter two : Related work.
Chapter three : Automatic term extraction (ATE).
Chapter four : Automatic term extraction workbench (ATEWB) system description.
Chapter five : A comparative study.
Chapter six : Conclusions and future work.
References.
نمط استشهاد جمعية علماء النفس الأمريكية (APA)
Sabah, Yusuf. (2005). Automatic term extraction using statistical techniques ; a comparative in-depth study and applications. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-303496
نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)
Sabah, Yusuf. Automatic term extraction using statistical techniques ; a comparative in-depth study and applications. (Master's theses Theses and Dissertations Master). Birzeit University. (2005).
https://search.emarefa.net/detail/BIM-303496
نمط استشهاد الجمعية الطبية الأمريكية (AMA)
Sabah, Yusuf. (2005). Automatic term extraction using statistical techniques ; a comparative in-depth study and applications. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-303496
لغة النص
الإنجليزية
نوع البيانات
رسائل جامعية
رقم السجل
BIM-303496
قاعدة معامل التأثير والاستشهادات المرجعية العربي "ارسيف Arcif"
أضخم قاعدة بيانات عربية للاستشهادات المرجعية للمجلات العلمية المحكمة الصادرة في العالم العربي
تقوم هذه الخدمة بالتحقق من التشابه أو الانتحال في الأبحاث والمقالات العلمية والأطروحات الجامعية والكتب والأبحاث باللغة العربية، وتحديد درجة التشابه أو أصالة الأعمال البحثية وحماية ملكيتها الفكرية. تعرف اكثر