Enhanced Arabic root-based lemmatizer

Other Title(s)

ليميتيزر محسن للجذور العربية

Dissertant

Ata, Halah

Thesis advisor

al-Hammuz, Ahmad

University

Middle East University

Faculty

Faculty of Information Technology

Department

Computer Science Department

University Country

Jordan

Degree

Master

Degree Date

2020

English Abstract

Generating meaningful information is a big task for any natural language processing application, the need to differentiate between original words and affixes in the Arabic language is important but complex in nature, stemmer and lemmatizer are the most needed components in the Arabic language processing Application.

As the fundamental functionality of stemming and lemmatizing is removing what is called word morphology into a common root or base.

In this thesis, we propose a new rule-based lemmatizer, which aims to enhance the use of natural language processing applications for the Arabic language by implementing welldefined rules which result in finding the word lemma without using a dictionary.

Our proposed model called “T’assel lemmatizer”, is the first lemmatizer which exploit the most frequent extra letters in the word based on priorities established according to the extra letters groups.

The dataset used is a set of proverbs in the standard Arabic language contains 480 proverbs and consists of 2,493 words including 1637 unique words, the accuracy of T’assel lemmatizer was 74.11%.

Main Topic

Information Technology and Computer Science

Topics

No. of Pages

48

Table of Contents

Table of contents.

Abstract.

Abstract in Arabic.

Chapter One : Introductions.

Chapter Two : Background and literature review.

Chapter Three : Methodology and the proposed model.

Chapter Four : Experimental results and discussion.

Chapter Five : Conclusion and future work.

References.

American Psychological Association (APA)

Ata, Halah. (2020). Enhanced Arabic root-based lemmatizer. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-970872

Modern Language Association (MLA)

Ata, Halah. Enhanced Arabic root-based lemmatizer. (Master's theses Theses and Dissertations Master). Middle East University. (2020).
https://search.emarefa.net/detail/BIM-970872

American Medical Association (AMA)

Ata, Halah. (2020). Enhanced Arabic root-based lemmatizer. (Master's theses Theses and Dissertations Master). Middle East University, Jordan
https://search.emarefa.net/detail/BIM-970872

Language

English

Data Type

Arab Theses

Record ID

BIM-970872