A Design of a Hybrid Algorithm for Optical Character Recognition of Online Hand-Written Arabic Alphabets

The growing relevance of printed and digitalized hand-written characters has necessitated the need for convalescent automatic recognition of characters in Optical Character Recognition (OCR). Among the handwritten characters, Arabic is one of those with special attention due to its distinctive nature, and the inherent challenges in its recognition systems. This distinctiveness of Arabic characters, with the difference in personal writing styles and proficiency, are complicating the effectiveness of its online handwritten recognition systems. This research, based on limitations and scope of previous related studies, studied the recognition of Arabic isolated characters through the identification of its features and dots in view of producing an efficient online Arabic handwriting isolated character recognition system. It proposes a hybrid of decision tree and Artificial Neural Network (ANN), as against being combined with other algorithms as found in previous studies. The proposed recognition process has four main steps with associated sub-steps. The results showed that the proposed method achieved the highest performance at 96.7%, whereas the benchmark methods which are EDMS and Naeimizaghiani had 68.88% and 78.5 % respectively. Based on this, ANN has the best performance recognition rate at 98.8%, while the best rate for decision tree was obtained at 97.2%.


Introduction
Optical character recognition (OCR) is one of the areas of applied studies of pattern recognition [1]. Pattern recognition uses machine learning models and various mathematical, statistical and heuristic techniques for the detection of patterns, and provides a computational implementation framework for its recognition systems [1,2]. OCR is employed in document scanners mostly used in banking and postal applications. The growing relevance of the printed and digitalized hand-written characters is, however, presently necessitating the need for convalescent automatic recognition of characters [3,4]. This is evolving into a new research area hand-written character recognition within the pattern recognition body of knowledge [5,6]. The reasons for this novel computational attention to handwritten character recognition researches are not far-fetched. First, OCRs are fundamentally used for sensing an offline text or character recognition, while the movements of pens are sensed for online text [7,8]. Now, the OCR systems must be improved to attend to the available text-to-digital character format in online systems. Second, hand-written character recognition is now employed and needed in automated mail sorting and check processing, and improving their performance is important [3,9]. This is applicable to all characters, and Arabic attracts special attention. Arabic alphabets, which consist of 28 characters, form words being written from right to left [8,10]. Hand-written Arabic characters, in OCR, demand research studies that will birth development of recognition algorithms and systems because of the distinctive characteristics of Arabic alphabets. The Arabic text is cursive and written from right to left [9,11]. It contains dots and other small marks that can completely change its meaning if wrongly captured by automated recognition systems [10,12]. Also, Arabic letters change in structure and shapes based on their positions in the words. An Arabic letter will have distinct shapes and forms when at the beginning, middle, and the end of a word [12,13].The distinctiveness of Arabic characters, especially the difficulties associated with automated recognition, and complicated with its online hand-written characters, necessitated this study [14]. Arabic character recognition can be problematic and difficult, especially when it is hand-written, with varying attendant imprecisions of the inclusions and locations of its dots. Arabic characters, when scribbled by different persons could present different shapes and sizes, even when the characters are the same. This is as a result of individual preferences of writing styles [15,16,17]. To attend to this, automated character recognition must be able to recognize the components edges and dots.
This research studied the recognition of Arabic isolated characters by identifying the features and dots in view of producing efficient online Arabic handwriting isolated character recognition system. To achieve this, a hybrid algorithm of a decision tree and artificial neural network is proposed. The second section of this paper review related works. The third section presents the proposed hybrid algorithm for automated character recognition of hand-written Arabic characters, while the fourth section presents the experimental results and findings. The fifth section, which is the last section, presents the results and discussion of this study.

Related Works
The holistic and analytical approaches are the leading strategies in solving problems associated with character recognition [18,19,20]. Holistic approaches entrench global attention of character image where the representation is dealt with the words rather than the individual characters [21,22]. It is widely used for solving text recognition problems and is based on dynamic programming [23,24]. The analytical approach, on the other hand, attends to words as a series of small-sized units. These units can be easily linked with characters to represent the vocabulary [25]. The analytical methods are further bifurcated into subclasses such as explicit segmentation and implicit segmentation [26]. In the former, the grapheme or pseudo-letter segmentation takes place prior to recognition and analytical methods and the later concurrently performs segmentation and recognition [27]. The analytical methods are the most popular and current approaches because they present a simplified method of building character recognition system, and understanding the underlying algorithms [13,14,28] are some of the past studies on Arabic character recognition. Based on Hidden Markov models and its structural features extraction, Khorsheed [28] proposed a technique for handling online handwritten recognition of Arabic script, in a way that significantly extended El-Sheikh and Guindi [29] and El-Sheikh and El-Taweel [30] algorithms on Arabic handwritten characters and cursive words recognition. Khorsheed [28], presumed characters outcome from a consistent segmentation phase and referred to their positions as priori. Initial, medial, final, and isolated are their shapes names, based on the position in the word. These are further categorized into four subsets depending on the number of character's strokes. Mezghan [31], later introduced online recognition for Arabic characters using the Kohonen maps with corresponding confusion matrices. The advantage is that it prunes the characters of error-causing nodes, and to combine them consequently. Amin and Singh [8] and Amin [32], using the C4.5 machine learning system, presented a technique for the recognition of Arabic words and Chinese characters. The technique involves the sequence of digitization, pre-processing feature extraction, and classification [3].However, they proposed an Arabic OCR system based on recognitionbased segmentation which is part of Amin [32] pre-processing feature extraction. Bataineh and Sheikh [33], also presented a method for recognizing Arabic handwritten characters. Their approach used to employed global feature extraction based on a texture analysis approach using the extended EDMS namely EDM1 method. These studies, though containing a varying number of dataset and training sets, it showed the efficacy of decision tree and artificial neural network in their respective system. This study proposes a hybrid of a decision tree and artificial neural network, as against being combined with other algorithms as found in previous studies, in view of improving the online Arabic character recognition.

3.
The Proposed Hybrid Algorithm The 3×3 edge direction matrix (EDM1), in the first order relationship, was created in the first order relationship, with each cell containing a position within 0 until 315 -degree value. Secondly, the number of occurrence for each value is calculated for each value and used in determining the relationship of the scoped pixel in the edge image ledge (x,y) [1]. The algorithm is shown in the Figure

Hussein and Hussein
Iraqi Journal of Science, 2019, Vol.60, No.9, pp: 2067-2079 2070 Each pixel in the edge image is related with two pixels. As shown in Figure-2, for example, the scoped pixel presents 180° and 45° for X1 and X2 respectively There are two relationships in each EDM1 pixel. Considering the relationship of the second order, one relationship only is presented by each pixel. The firstly-created is the 3×3 edge direction matrix (EDM2). Subsequently, the sorting of the EDM1 values determines the importance of the relationship of the Edge (x, y). This sorting is done in the descending order and presented in Table-1.

Table 1-The Order of importance Angle's
The most necessary relationship in the scoped pixel of Edge (x, y) is taken with the calculation of the number of occurrences experienced by each value in EDM2. This follows the following order: (a) when there are more angles (than one) of the same number of occurrences, it implies that the smaller angle is to be selected first, (b) the reversal is then selected next. Figure-3 shows the algorithm of the second order of EDM2 relationship. Homogeneity: This feature is represented by the percentages of each direction to all directions that are available in the edge character as follows: Homogeneity (θ) =EDM1(x, y)/(ΣEDM1(x, y)).
On the other hand, in the extraction of Arabic character's features, and the associated statistical analysis, some geometrical features used which are the width and highest of the horizontal and vertical base, the number of occurrence in (H, V) projection, and width of the dot. The proposed flowchart for the geometrical features extractions presented in Figure-4.

Experimental Results and Findings
In this study, the recognition process has four main steps with associated sub-steps. The steps are (a) Pre-processing (b) feature extraction (EDMS, Geometrical and structural Features), (c) rule-based technique for recognition, and (d) dataset collecting for testing the proposed method. In Table-2, we presented the recognition result of each character and the total recognition rate for all characters. In this table, we showed the number of characters that completed 100% of a recognition rate for all users, and the number of characters that have a low recognition rate. Based on the results of the correction rates for each character shown in Table- Figure-5 show the recognition results in terms of the characters for all users that are tested with the handwritten characters for three times. The third column in these tables represents the number of characters that were correctly recognized. The fourth column displays the number of characters that were incorrectly recognized.

Results and discussion 5.1 Evaluation of the Online-OCR Based On Proposed Feature Extraction Technique
The dataset was split into two groups; namely, training and testing datasets. This is to evaluate the proposed feature extraction technique for online OCR. In the experiment, 60% and 70% used as a benchmark for the training dataset. It shows, according to the experimental results, that the method proposed has higher accuracy rates, when compared with EDMS and Naeimizaghiani feature extraction methods. A decision tree of 60% training dataset is shown to have the best performance

Hussein and Hussein
Iraqi Journal of Science, 2019, Vol.60, No.9, pp: 2067-2079 2074 when compared with EDMS of 68.5% and Naeimizaghiani method of 79.8% when the decision tree classifier training dataset has 62%. As shown in Table-4 and Figure-6, the proposed feature extraction method achieved the highest performance with 61% training dataset and with the artificial neural network about 98.85%. Table-4    The experiments on the 61% training dataset with decision tree are repeated five times in order to extract the performance of the previous experiments. This is also because the decision tree processing time is much faster than ANN, but the performance is almost similar. The average of the results is presented in Table-5 and Figure-7. The results show that the proposed method achieved the highest performance at 96.7%, whereas the EDMS method achieved 68.88%, and Naeimizaghiani method achieved 78.5 %.   Table-6. The proposed method obtained a standard deviation lower than the EDMS method which was 6.0, and Naeimizaghiani method was 2.14. Therefore, the features of the proposed method are more effective than the other methods. Table 6-The standard deviation for the five experiments' results The detailed results for each class of the higher experimental results are shown in Table-7   For EDMS, the highest accuracy was achieved with the ( ‫ي‬ , ‫ل‬ , ‫ك‬ , ‫ق‬ , ‫ف‬ , ‫ش‬ , ‫ذ‬ , ‫خ‬ , ‫ت‬ , ‫ا‬ ) at 86 % accuracy rate. The lowest accuracy was obtained with the ( , ‫غ‬ , ‫ع‬ , ‫ض‬ , ‫ص‬ , ‫س‬ , ‫ز‬ , ‫ر‬ , ‫د‬ , ‫ح‬ , ‫ج‬ , ‫ث‬ , ‫ب‬ ‫و‬ , ‫ه‬ , ‫ن‬ , ‫م‬ , ‫ظ‬ , ‫)ط‬ at 55%. Taking these results into account, the proposed method obtained an accuracy rate higher than that of EDMS and Naeimizaghiani for each class. Based on Table-4 and  Table-7 the proposed method produced the best accuracy rate compared with the EDMS method and Naeimizaghiani method using ANN and decision tree classifier it produced about 96.7% accuracy rate.

Evaluation of the Online-OCR Based on Proposed Rule-based Technique
In the experiments conducted, the evaluation of the proposed rule-based technique's performance was done through a comparison of the different classification methods used in the OCR. The artificial neural network (ANN) and decision tree classifiers are used in this study, though in combination, as against the single usage in related previous OCR studies. The best classification methods employed in handwriting recognition, from the literature review, are artificial neural network and decision tree. The decision tree and multilayer artificial neural network are applied to the earlier proposed feature extraction in finding the best performance. The dataset was split into training and testing datasets for the experiment. 60% is taken as the benchmark for the training dataset, while 40% is used for the testing dataset. The result of the experiment showed that ANN has the best performance at a recognition rate of 98.8%. The decision tree, on the other hand, was at 97.2%. As shown in Table-8 and Figure-8, the introduced rule-based method performs high performance by 97% recognition rates. According to the results, it is recorded that the proposed method achieved competitive performance with ANN and decision tree methods in every experiment. Table 8-The classification results using the proposed feature extraction  Based on the experiment result, shown in Table-9 above, the proposed method gives the competitive result and accuracy, compared with the ANN method and decision tree method after applied on the proposed feature extraction. It produced approximately 97 % accuracy rate.

Conclusion
In this study, a new online Arabic handwriting character recognition system is developed. The recognition system dealt with isolated Arabic character only. It uses projection profile (V, H) and Laplacian filter in pre-processing phase, geometrical and EDMS features in feature extraction phase, and rule based method in recognition phase. Based on the result obtained, the proposed methods are more efficient, when compared with related past studies. The objectives of this work which are (a) developing a new system for online Arabic character recognition, (b) proposing hybrid feature extraction technique for Arabic character recognition, and (c) proposing a rule based technique to recognize Arabic character for online Arabic handwriting character recognition system are all achieved. Finally, the implementation and evaluation of the proposed method was achieved through a comparison of the proposed feature extraction with EDMS and Naeimizaghiani feature extraction methods with ANN and decision tree classifiers.