Development of a Hindi named entity recognition system without using manually annotated training corpus

المؤلفون المشاركون

Saha, Sujan
Majumder, Mukta

المصدر

The International Arab Journal of Information Technology

العدد

المجلد 15، العدد 6 (30 نوفمبر/تشرين الثاني 2018)10ص.

الناشر

جامعة الزرقاء

تاريخ النشر

2018-11-30

دولة النشر

الأردن

عدد الصفحات

10

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الملخص EN

Machine learning based approach for named entity recognition (NER) requires sufficient annotated corpus to train the classifier.

Other NER resources like gazetteers are also required to make the classifier more accurate.

But in many languages and domains relevant NER resources are still not available.

Creation of adequate and relevant resources is costly and time consuming.

However a large amount of resources and several NER systems are available in resource-rich languages, like English.

Suitable language adaptation techniques, NER resources of a resource-rich language and minimally supervised learning might help to overcome such scenarios.

In this paper we have studied a few such techniques in order to develop a Hindi NER system.

Without using any Hindi NE annotated corpus we have achieved a reasonable accuracy of F-Measure 73.87 in the developed system

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Saha, Sujan& Majumder, Mukta. 2018. Development of a Hindi named entity recognition system without using manually annotated training corpus. The International Arab Journal of Information Technology،Vol. 15, no. 6.
https://search.emarefa.net/detail/BIM-874018

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Saha, Sujan& Majumder, Mukta. Development of a Hindi named entity recognition system without using manually annotated training corpus. The International Arab Journal of Information Technology Vol. 15, no. 6 (Nov. 2018).
https://search.emarefa.net/detail/BIM-874018

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Saha, Sujan& Majumder, Mukta. Development of a Hindi named entity recognition system without using manually annotated training corpus. The International Arab Journal of Information Technology. 2018. Vol. 15, no. 6.
https://search.emarefa.net/detail/BIM-874018

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes bibliographical references

رقم السجل

BIM-874018