Automatic term extraction using statistical techniques ; a comparative in-depth study and applications

Dissertant

Sabah, Yusuf

Thesis advisor

Abu al-Rubb, Haytham
Abu Zarr, Yusuf

University

Birzeit University

Faculty

Faculty of Engineering and Technology

Department

Department of Computer Science

University Country

Palestine (West Bank)

Degree

Master

Degree Date

2005

English Abstract

The idea of Information Retrieval (IR) has been generated during the evolutionary change in the way cultural, social or scientific information are stored from inkonpaper to digital libraries distributed on international networks.

Typically this information concerns material such as text, graphic documents (pictures, maps technical drawings etc.), sound and moving images.

In attempt to make such huge amounts of information efficiently retrievable, some techniques for Automatic Term Extraction (ATE) are proposed.

This Automatic procedure is considered the cornerstone of a wide range of applications such as search engines, because manual production of keywords is highly labor intensive.

To ensure precise information retrieval, the extracted keywords should accurately describe the contents of their documents.

To improve this operation, researchers proposed many techniques for Automatic. Term Extraction (ATE) or Automatic Indexing, some used statistical techniques and others used syntactic and probabilistic techniques.

This thesis is a comparative study aimed at leading to the use of statistical techniques including four techniques : Term Frequency (TF), Inverse Document Frequency (IDF) combined Term Frequency-Inverse Document Frequency (TFx IDF) and Term Discrimination Value Model (TDVM).

We have also developed a computational tool for Automatic Term Extraction (ATEWB) to be used in the comparison ; three experiments are used for this purpose to specify the conditions in which each technique is mostly efficient and / or accurate.

On the other hand, this thesis aims at improving statistical techniques efficiency through the utilization of database engines to reduce the computations time of their algorithms.

As well as improving documents retrieval by caching them in the database.

We have tested our model on a collection of abstracts of papers in the field of automatic term extraction, containing keywords composed by their authors in the first experiment, and a collection of documents prepared for test available on some web sites concerned with IR.

Main Subjects

Information Technology and Computer Science

Topics

No. of Pages

142

Table of Contents

Table of contents.

Abstract.

Chapter one : Introduction.

Chapter two : Related work.

Chapter three : Automatic term extraction (ATE).

Chapter four : Automatic term extraction workbench (ATEWB) system description.

Chapter five : A comparative study.

Chapter six : Conclusions and future work.

References.

American Psychological Association (APA)

Sabah, Yusuf. (2005). Automatic term extraction using statistical techniques ; a comparative in-depth study and applications. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-303496

Modern Language Association (MLA)

Sabah, Yusuf. Automatic term extraction using statistical techniques ; a comparative in-depth study and applications. (Master's theses Theses and Dissertations Master). Birzeit University. (2005).
https://search.emarefa.net/detail/BIM-303496

American Medical Association (AMA)

Sabah, Yusuf. (2005). Automatic term extraction using statistical techniques ; a comparative in-depth study and applications. (Master's theses Theses and Dissertations Master). Birzeit University, Palestine (West Bank)
https://search.emarefa.net/detail/BIM-303496

Language

English

Data Type

Arab Theses

Record ID

BIM-303496