Automatic term extraction using statistical techniques ; a comparative in-depth study and applications
Dissertant
Thesis advisor
Abu al-Rubb, Haytham
Abu Zarr, Yusuf
University
Birzeit University
Faculty
Faculty of Engineering and Technology
Department
Department of Computer Science
University Country
Palestine (West Bank)
Degree
Master
Degree Date
2005
English Abstract
The idea of Information Retrieval (IR) has been generated during the evolutionary change in the way cultural, social or scientific information are stored from inkonpaper to digital libraries distributed on international networks.
Typically this information concerns material such as text, graphic documents (pictures, maps technical drawings etc.), sound and moving images.
In attempt to make such huge amounts of information efficiently retrievable, some techniques for Automatic Term Extraction (ATE) are proposed.
This Automatic procedure is considered the cornerstone of a wide range of applications such as search engines, because manual production of keywords is highly labor intensive.
To ensure precise information retrieval, the extracted keywords should accurately describe the contents of their documents.
To improve this operation, researchers proposed many techniques for Automatic. Term Extraction (ATE) or Automatic Indexing, some used statistical techniques and others used syntactic and probabilistic techniques.
This thesis is a comparative study aimed at leading to the use of statistical techniques including four techniques : Term Frequency (TF), Inverse Document Frequency (IDF) combined Term Frequency-Inverse Document Frequency (TFx IDF) and Term Discrimination Value Model (TDVM).
We have also developed a computational tool for Automatic Term Extraction (ATEWB) to be used in the comparison ; three experiments are used for this purpose to specify the conditions in which each technique is mostly efficient and / or accurate.
On the other hand, this thesis aims at improving statistical techniques efficiency through the utilization of database engines to reduce the computations time of their algorithms.
As well as improving documents retrieval by caching them in the database.
We have tested our model on a collection of abstracts of papers in the field of automatic term extraction, containing keywords composed by their authors in the first experiment, and a collection of documents prepared for test available on some web sites concerned with IR.
Main Subjects
Information Technology and Computer Science
Topics
No. of Pages
142
Table of Contents
Table of contents.
Abstract.
Chapter one : Introduction.
Chapter two : Related work.
Chapter three : Automatic term extraction (ATE).
Chapter four : Automatic term extraction workbench (ATEWB) system description.
Chapter five : A comparative study.
Chapter six : Conclusions and future work.
References.
American Psychological Association (APA)
Sabah, Yusuf. (2005). Automatic term extraction using statistical techniques ; a comparative in-depth study and applications. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-303496
Modern Language Association (MLA)
Sabah, Yusuf. Automatic term extraction using statistical techniques ; a comparative in-depth study and applications. (Master's theses Theses and Dissertations Master). Birzeit University. (2005).
https://search.emarefa.net/detail/BIM-303496
American Medical Association (AMA)
Sabah, Yusuf. (2005). Automatic term extraction using statistical techniques ; a comparative in-depth study and applications. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-303496
Language
English
Data Type
Arab Theses
Record ID
BIM-303496