UCOM offline dataset-an Urdu handwritten dataset generation

المؤلفون المشاركون

Bin Ahmad, Sad
Naz, Saidah
Swati, Salah al-Din
Razzak, Muhammad
Umar, Arif
Khan, Akbar

المصدر

The International Arab Journal of Information Technology

العدد

المجلد 14، العدد 2 (31 مارس/آذار 2017)7ص.

الناشر

جامعة الزرقاء

تاريخ النشر

2017-03-31

دولة النشر

الأردن

عدد الصفحات

7

التخصصات الرئيسية

تكنولوجيا المعلومات وعلم الحاسوب

الملخص EN

A benchmark database for character recognition is an essential part for efficient and robust development.

Unfortunately, there is no comprehensive handwritten dataset for Urdu language that would be used to compare the state of the art techniques in the field of optical character recognition.

In this paper, we present a new and publically available dataset comprising 600 pages of handwritten Urdu text written in Nasta’liq style in conjunction with detailed ground truth for the evaluation of handwritten Urdu character recognition.

This dataset contains text lines written in Nasta’liq style by limited individuals on A4 size paper.

The acquired data on page was scanned and text lines were segmented.

UCOM database covers all Urdu characters and ligatures with different variation in addition to Urdu numeric data.

We have considered that ligature consists of up to five characters in this dataset.

The UCOM dataset can be used for handwritten character recognition as well as writer identification.

We proposed and evaluated the strength of Recurrent Neural Networks (RNN) on UCOM offline database sample text line.

نمط استشهاد جمعية علماء النفس الأمريكية (APA)

Bin Ahmad, Sad& Naz, Saidah& Swati, Salah al-Din& Razzak, Muhammad& Umar, Arif& Khan, Akbar. 2017. UCOM offline dataset-an Urdu handwritten dataset generation. The International Arab Journal of Information Technology،Vol. 14, no. 2.
https://search.emarefa.net/detail/BIM-693681

نمط استشهاد الجمعية الأمريكية للغات الحديثة (MLA)

Khan, Akbar…[et al.]. UCOM offline dataset-an Urdu handwritten dataset generation. The International Arab Journal of Information Technology Vol. 14, no. 2 (2017).
https://search.emarefa.net/detail/BIM-693681

نمط استشهاد الجمعية الطبية الأمريكية (AMA)

Bin Ahmad, Sad& Naz, Saidah& Swati, Salah al-Din& Razzak, Muhammad& Umar, Arif& Khan, Akbar. UCOM offline dataset-an Urdu handwritten dataset generation. The International Arab Journal of Information Technology. 2017. Vol. 14, no. 2.
https://search.emarefa.net/detail/BIM-693681

نوع البيانات

مقالات

لغة النص

الإنجليزية

الملاحظات

Includes appendices.

رقم السجل

BIM-693681