Modern standard Arabic grammar automatic extraction from Penn Arabic Treebank using natural language toolkit

Other Title(s)

استخلاص قواعد النحو لتراكيب جمل اللغة العربية المعاصرة آليا باستخدام عينة لغوية من Penn Arabic Treebank باستخدام NLTK

Joint Authors

Abd al-Halim, Amirah
al-Ansari, Samih

Source

The Egyptian Journal of Language Engineering

Issue

Vol. 5, Issue 1 (30 Apr. 2018), pp.1-10, 10 p.

Publisher

Egyptian Society of Language Engineering

Publication Date

2018-04-30

Country of Publication

Egypt

No. of Pages

10

Main Subjects

Information Technology and Computer Science

Abstract EN

This paper presents a methodology for rule based bottom up parsing technique forModern Standard Arabic (MSA) in Context Free Grammar (CFG) formalism in Phrase Structure Grammar (PSG) representation, where the grammar is automatically extracted from a syntactically annotated corpus.The extracted grammar is used to build an automatic lexicon and grammar rules module.

Furthermore, the extracted CFG is further transformed into Probabilistic Context Free Grammar (PCFG) that could be used in a hybrid approach, which is also calculated automatically.

The used corpus is the Penn Arabic Treebank(PATB)and algorithm implementation is performed with Natural Language Processing Toolkit (NLTK).The parser showed that automatic extraction of grammar improved the grammar building phase in both coverage of structures and time needed, but still needs further manual constrains addition.

Automatic extraction of grammar is able to enhance rule based grammar parsers and it will enable a new paradigm of statistically directed symbolic parsing.

American Psychological Association (APA)

Abd al-Halim, Amirah& al-Ansari, Samih. 2018. Modern standard Arabic grammar automatic extraction from Penn Arabic Treebank using natural language toolkit. The Egyptian Journal of Language Engineering،Vol. 5, no. 1, pp.1-10.
https://search.emarefa.net/detail/BIM-941786

Modern Language Association (MLA)

Abd al-Halim, Amirah& al-Ansari, Samih. Modern standard Arabic grammar automatic extraction from Penn Arabic Treebank using natural language toolkit. The Egyptian Journal of Language Engineering Vol. 5, no. 1 (Apr. 2018), pp.1-10.
https://search.emarefa.net/detail/BIM-941786

American Medical Association (AMA)

Abd al-Halim, Amirah& al-Ansari, Samih. Modern standard Arabic grammar automatic extraction from Penn Arabic Treebank using natural language toolkit. The Egyptian Journal of Language Engineering. 2018. Vol. 5, no. 1, pp.1-10.
https://search.emarefa.net/detail/BIM-941786

Data Type

Journal Articles

Language

English

Notes

Record ID

BIM-941786