ADL: A New Dataset of Select Arabic-Derived Letters for Handwritten Character Recognition

Authors

  • Mouhssine EL ATILLAH Computer Systems Engineering, Mathematics and Applications (ISIMA), Polydisciplinary faculty of Taroudant, University Ibn Zohr, Troudant, Morocco https://orcid.org/0000-0002-3431-8143

DOI:

https://doi.org/10.24996/ijs.2025.66.10.43

Keywords:

Arabic handwritten characters, ADL dataset, Arabic derived letters, Optical character recognition, Arabic character recognition system, SVM, CNN, ViT

Abstract

Arabic text and characters recognition are among the most challenging problems in the field of optical character recognition (OCR) due to the complex nature of the letters and their variance in forms. This paper presents a new ADL dataset of the Arabic derived letters: Che (چ), Ngain (ڠ), Pe (پ), Ve (ڤ), Zhe (ژ), and the three versions of Gaf (گ, ڭ, and ݣ). This dataset consists of 55,440 images from scanned handwritten papers made by 30 participants of different ages, thus ensuring demographic and stylistic diversity. The dataset has been evaluated on three different models: a Support Vector Machine (SVM), a Convolutional Neural Network (CNN), and a Vision Transformer (ViT) to demonstrate its practical usability. Several forms for different alphabet positions are used to represent each letter: isolated, initial, medial, and final, to reflect its real-world usage, which increases the value of the dataset for machine learning applications. The dataset was enriched using data augmentation techniques based on random rotation, horizontal shift, and zooming with nearest neighbor interpolation to fill empty pixels, which allowed for representing each character in a balanced way while preserving the essential structural elements. The dataset, as the first structured resource on some derived letters of the Arabic language, aims to fill a crucial gap in datasets focused on Arabic script and to advance research on handwritten character recognition. This dataset has important implications for linguistic research, practical applications, and advances in automated text processing systems by supporting better recognition of these non-original derived letters, especially in optical character recognition (OCR) systems.

Downloads

Published

2025-10-30

Issue

Section

Computer Science

How to Cite

[1]
M. EL ATILLAH, “ADL: A New Dataset of Select Arabic-Derived Letters for Handwritten Character Recognition”, Iraqi Journal of Science, vol. 66, no. 10, pp. 4625–4643, Oct. 2025, doi: 10.24996/ijs.2025.66.10.43.

Similar Articles

11-20 of 1871

You may also start an advanced similarity search for this article.