UCOM offline dataset-an Urdu handwritten dataset generation

Joint Authors

Bin Ahmad, Sad
Naz, Saidah
Swati, Salah al-Din
Razzak, Muhammad
Umar, Arif
Khan, Akbar

Source

The International Arab Journal of Information Technology

Issue

Vol. 14, Issue 2 (31 Mar. 2017)7 p.

Publisher

Zarqa University

Publication Date

2017-03-31

Country of Publication

Jordan

No. of Pages

7

Main Subjects

Information Technology and Computer Science

Abstract EN

A benchmark database for character recognition is an essential part for efficient and robust development.

Unfortunately, there is no comprehensive handwritten dataset for Urdu language that would be used to compare the state of the art techniques in the field of optical character recognition.

In this paper, we present a new and publically available dataset comprising 600 pages of handwritten Urdu text written in Nasta’liq style in conjunction with detailed ground truth for the evaluation of handwritten Urdu character recognition.

This dataset contains text lines written in Nasta’liq style by limited individuals on A4 size paper.

The acquired data on page was scanned and text lines were segmented.

UCOM database covers all Urdu characters and ligatures with different variation in addition to Urdu numeric data.

We have considered that ligature consists of up to five characters in this dataset.

The UCOM dataset can be used for handwritten character recognition as well as writer identification.

We proposed and evaluated the strength of Recurrent Neural Networks (RNN) on UCOM offline database sample text line.

American Psychological Association (APA)

Bin Ahmad, Sad& Naz, Saidah& Swati, Salah al-Din& Razzak, Muhammad& Umar, Arif& Khan, Akbar. 2017. UCOM offline dataset-an Urdu handwritten dataset generation. The International Arab Journal of Information Technology،Vol. 14, no. 2.
https://search.emarefa.net/detail/BIM-693681

Modern Language Association (MLA)

Khan, Akbar…[et al.]. UCOM offline dataset-an Urdu handwritten dataset generation. The International Arab Journal of Information Technology Vol. 14, no. 2 (2017).
https://search.emarefa.net/detail/BIM-693681

American Medical Association (AMA)

Bin Ahmad, Sad& Naz, Saidah& Swati, Salah al-Din& Razzak, Muhammad& Umar, Arif& Khan, Akbar. UCOM offline dataset-an Urdu handwritten dataset generation. The International Arab Journal of Information Technology. 2017. Vol. 14, no. 2.
https://search.emarefa.net/detail/BIM-693681

Data Type

Journal Articles

Language

English

Notes

Includes appendices.

Record ID

BIM-693681