Tackling Skewness, Noise, and Broken Characters in Mathematical Expression Segmentation

: Segmentation is one of the most computer vision processes importance, it aims to understand the image contents by partitioning it into segments that are more meaningful and easier to analyze. However, this process comes with a set of challenges including image skew, noise, and object clipping. In this paper, a solution is proposed to address the challenges encountered when using Optical Character Recognition to recognize mathematical expressions. The proposed method involves three stages: pre-processing, segmentation, and post-processing. During pre-processing, the mathematical expression image is transformed into a binary image, noise reduction techniques are applied, image component discontinuities are resolved, and skew correction is performed. Two skew correction methods are proposed: The first method is the Deskewing using iterative PCA, and the second method is the PCA prediction. The line fitting-correction image deskewing and both gave better results than the well-known Hough transformation method. In the segmentation stage, the vertical and horizontal distances between mathematical expression components are utilized to segment the components. Post-processing is employed to reassemble split symbols into a single entity. The proposed method achieves an average detection rate of 97.32%, demonstrating improved recognition outcomes for mathematical expressions.


Introduction
Optical Character Recognition (OCR) is a technology that enables computers to identify and interpret text present in images and convert them into a machine-readable format. The origins of OCR can be traced back to the early 1900s when the technology was first developed to automate the process of reading and transcribing documents [1]. Initially, OCR systems relied on mechanical and electromechanical technology, which resulted in low accuracy. However, with the advent of digital technology in the 1960s and 1970s, the accuracy of OCR systems improved significantly. Further, advancements came in the 1980s and 1990s with the development of machine learning algorithms and the application of neural networks, which led to even greater recognition accuracy. In recent years, OCR technology has undergone rapid development due to the widespread availability of high-performance computing and the emergence of deep learning techniques. As a result, OCR is now widely used in various applications such as document scanning and indexing, digital archiving, and mobile text recognition [2].
Applying OCR to mathematical expressions is challenging due to the complexity of mathematical notation, including symbols that are not found in standard text, variations in font style and size, and different notations. This makes it difficult for OCR to accurately recognize and understand mathematical expressions [3]. The presence of noise in images such as smudges, scratches, or marks, presents another challenge in using OCR to segment and recognize mathematical expressions. This noise can affect the quality of the image and make it harder for OCR to accurately recognize the symbols [4].
Another more difficult challenge is when using OCR to recognize mathematical expressions, the image may be skewed which can make it difficult to properly segment the characters and recognize them. To improve the OCR performance, it is necessary to preprocess the image and correct any skew or other distortions before attempting to recognize the characters. The Deskew techniques can be applied using different algorithms such as Hough transform [5], edge detection [6], and Principal Component Analysis (PCA) [7]. These techniques are used to determine the rotation or skew of the image and then rotate or skew the image back to its original position. Principal Component Analysis (PCA) is a statistical technique that is commonly used in image processing to reduce the dimensionality of an image and to remove noise and other unwanted variations. One specific application of PCA in image processing is deskewed, which is the process of correcting the alignment of an image that has been rotated or skewed [7].

Related Works
Within the realm of mathematical expression recognition, numerous researchers have contributed various ideas and methods. One noteworthy contribution was made by a group of authors who proposed a novel OCR method for recognizing printed mathematical expressions. The method leveraged deep learning techniques, specifically Convolutional Neural Networks (CNNs) and Long Short-Term Memory Networks (LSTMs), resulting in high accuracy on several benchmark datasets. Furthermore, the authors introduced a new dataset that contains images of printed mathematical expressions, and demonstrated the efficacy of their approach to this data [8]. In another recent work, a different group of researchers presented a novel endto-end OCR method for recognizing printed mathematical expressions. Their approach utilized Graph Convolutional Networks (GCNs) which yielded state-of-the-art performance on several benchmark datasets. By leveraging GCNs, the proposed method achieved superior results by effectively capturing and incorporating structural information of mathematical expressions [9].
These contributions highlight the ongoing progress and advancements in the field of mathematical expression recognition. Researchers continue to innovate and develop new approaches to tackle the challenges of recognizing and interpreting these complex symbols and structures.
This paper aims to propose a solution for the difficulties faced when using OCR for mathematical expression recognition by achieving precise segmentation of the components of the expressions, thereby enhancing recognition outcomes.

Proposed Method
The proposed method consists of three main stages: Preprocessing, segmentation, and postprocessing. In the preprocessing stage, the image of the mathematical expression is converted into a binary image and a noise reduction operation is applied to address any noise introduced during the scanning process. Discontinuities in the image components are then addressed, which can result from defects in the scanning process or from the noise removal process. Finally, the skew correction is performed to correct any skewing that may have occurred during scanning. This step is crucial as the suggested segmentation stage heavily relies on the vertical and horizontal spaces separating the components of the mathematical expression. The details of this stage are given in section 3.
The proper alignment of the image achieved through skew correction enables the segmentation of the components of the mathematical expression by utilizing the vertical and horizontal spaces between them. The methodology for this segmentation is discussed in detail in section 4.
The segmentation stage of the image may produce inaccurate results, particularly with symbols that have multiple pieces, for example, symbols like (= , ÷ , ≡, ≥, ≤ ). To address this issue, there is a need for a post-processing stage that involves rejoining these split symbols back into a single entity, ensuring the proper segmentation of the image. The details of this stage are given in section 5. The proposed method is depicted in a block diagram format in Figure 1, which shows the three stages of the proposed method and the individual steps involved in each stage.

Preprocessing
There are several challenges that one may encounter when trying to segment a scanned image of a mathematical expression. These include noise [10], poor image quality [11], complex layouts, overlapping characters [12], different fonts and writing styles [13], nonuniform backgrounds [11], broken characters [14], and skew [6]. Preprocessing the image can help to address these challenges and improve the accuracy and reliability of the segmentation process. Preprocessing steps may include noise reduction, image enhancement, skew correction, and background removal.

Image Binarization
In the process of segmenting a mathematical expression, image binarization plays a crucial role in isolating the characters of the expression from the background and one another. This allows for more accurate segmentation of the image. There are several methods for performing image binarization including global thresholding [15], local thresholding [16], adaptive thresholding [17], and Otsu's method [18].
Otsu's method is a preferred option for image binarization due to its automatic nature and ability to determine the optimal threshold value based on the image histogram. This helps to effectively separate the foreground and background of the image. The effectiveness of Otsu's method has been proven in various applications [19], [20]. In this work, Otsu's method was used for image binarization to achieve accurate results.

Noise removal
Noise reduction is a crucial preprocessing stage for removing unwanted elements from images, particularly in the context of image segmentation where noise can result in oversegmentation. Median filtering [21], a non-linear approach, replaces a pixel value with the median value of the surrounding pixels, effectively removing noise and preserving image edges make it a valuable tool for image segmentation. The implementation of noise removal enhances the accuracy and reliability of image segmentation by minimizing the impact of noise on the image [22].

Joining the Broken Characters
Broken characters in mathematical expressions can pose a challenge in proper segmentation. To overcome this issue, the spaces in the broken text can be filled in through the use of dilation [14]. Dilation is a process in mathematics that enlarges an object without altering its shape, which is widely used in image processing to fill in spaces or gaps. By applying dilation to the broken text, the broken characters can be joined which leads to improving accuracy in the segmentation process.
Dilation is represented mathematically as the dilation of an image using a structuring element , it is written as follows [14]: This equation is based on obtaining the reflection of about its origin and translating this reflection by . A demonstration of joining broken characters can be found in Figure 2.

Deskewing
During the scanning of a document, skewness may occur, which leads to a distorted image of mathematical expressions. This can cause issues during segmentation as the proposed method relies on finding the vertical gaps between characters. To address this, the image should be preprocessed to eliminate the skewness and align horizontally with an angle close to zero [7]. Two methods have been proposed here for this, namely using iterative Principal Components Analysis (PCA) and finding a predicted rotation angle and correcting it through line fitting on points near the expression's midline.
The first method, using PCA, involves using the eigenvectors of the covariance matrix of the image to iteratively rotate the image until the angle of rotation is close to zero. The second method, finding a predicted rotation angle and correcting it through line fitting, involves using the horizontal and vertical projections of the image to find the angle of rotation. Both methods can be used to correct the skewness in the image and make it more suitable for segmentation.

Deskweing Using Iterative PCA
The mathematical foundation of PCA is based on linear algebra and eigenvectors. When it comes to image rotation, the PCA can determine the rotation angle of an image by identifying its principal component and using it to align the image. This approach resembles deskewing, but instead of using a fixed angle, the PCA extracts the angle from the data. The PCA can serve as a preliminary step before applying other image processing techniques to enhance their efficacy [23].
The focus is placed on the edges of the symbols and numbers in the equation to improve the calculation and accuracy. This is accomplished by employing the Canny filter, which applies two thresholds to the gradient: a high threshold for low-edge sensitivity and a low threshold for high-edge sensitivity. The edges are initially detected with low sensitivity and then expanded to include connected edge pixels from the high sensitivity result, which assists in filling any gaps in the edge detection [24] [25].
To perform PCA, the coordinates of the outline pixels generated by the canny filter are gathered in matrix A.
where and represents row and column coordinates, respectively, and represents the total number of pixels.
The matrix has a size of × 2. The mean is a row vector containing the mean of the elements in each column of is calculated using the following formula: And the covariance matrix can be computed as: The matrix is a pair of orthogonal eigenvectors and is 2 × 2. Assume is general matrix, = ( ), then = ( ), = ( ), and = covariance( ). The eigenvalues of this matrix represent the magnitude of the variances and eigenvectors that indicates the direction of these variances. Then the eigenvectors can be calculated as follows: where and ( = 1,2) represent the eigenvectors and eigenvalues of , respectively [26]. These eigenvectors are then reflected in a matrix of the form: The skew angle of the image can be extracted using the following formula: So, the newly obtained angle represents both the skew angle and direction of the image's skew [7].
Since this method provides an approximation of the angle of rotation, it was used iteratively, to converge on an angle close to zero. This iterative process allows for fine-tuning the angle, ensuring that it is as close to zero as possible. Algorithm 1 and Figure 4 describe the steps of this method. Step 1: Find the edges of the binary image.
Step 2: Calculate the rotation angle using PCA, .
Step 3: Rotate the binary image by .
Step 4: Check if the angle is close enough to zero using a threshold value or a number of iterations.
Step 5: If the angle is not close enough to zero and the number of iterations is less than the maximum iteration number, repeat step 2.
Step 6: Return the deskewed binary image.

PCA-Prediction and Line Fitting-Correction Image Deskewing
Using PCA, the predicted value of the rotation angle is found and used to rotate the image. Points centered around the central line of the equation are identified, and a line fitting is performed on these points. The angle of inclination of this line is calculated and adopted as the final rotation angle for the image. Algorithm 2 and Figure 5 describe the steps of this method. Algorithm 2: PCA and Line Fitting based Method for Image Deskewing Input: Binary Image Output: Deskewed Image Step 1: Find the edges of the binary image.
Step 2: Use PCA to calculate the predicted value of the rotation angle for the binary image .
Step 3: Rotate the binary image by the predicted angle .
Step 4: Find points around the central line of the equation.
Step 5: Perform line fitting on the found points.
Step 6: Calculate the angle of inclination of the line.
Step 7: Adopt this angle as the final rotation angle for the binary image .
Step 8: Rotate the binary image by the final rotation angle .
Step 9: Return the deskewed binary image.

Segmentation
Binary image segmentation can be mathematically defined as the process of partitioning a set of pixels, , in a binary image into multiple segments or regions, represented by a collection of non-empty disjoint subsets = { 1 , 2 , … , }. The partition is defined such that the union of all subsets is equal to , meaning that every pixel in the binary image is assigned to exactly one subset in the partition . This process can be represented by a mapping, : → , where for each pixel ∈ , ( ) is the unique subset ∈ to which belongs, representing the segment or region of the binary image to which pixel belongs.
Several methods can be used to perform binary image segmentation including thresholding [11], Connected Component Analysis (CCA) [27], edge detection [28], region growing [29], watershed algorithm [30] and machine learning-based methods [31]. The choice of the method depends on the nature of the image, the desired output and the computational resources available.
In this paper, the segmentation process was carried out based on the horizontal and vertical distances that divide the main components of the equation. Horizontal and vertical projections can be easily obtained by counting the number of 1 pixel for each bin in the vertical and horizontal directions, respectively. The projections are calculated by summing the values of the binary image along the rows and columns respectively, as described in the following equations = ∑ =1 where ( = 1,2, … , ) .
= ∑ =1 where ( = 1,2, … , ). As a demonstration, Figure 6(b) shows that the histogram of the vertical projection of the equation is divided into three separate regions 1 , 2 and 3 , while Figure 6(c) shows that the histogram of the horizontal projection of 3 divides it into three regions 1 , 2 and 3 , and it is clear that they represent the numerator, fraction line and denominator of the fractional part of the equation, respectively. Algorithm 3 describes the steps of the segmentation stage.

Algorithm 3: Binary Image Segmentation
Input: Binary Image Output: Regions with their dimensions Step 1: Calculate the vertical projection of the binary image, .
Step 2: Use to calculate the number of regions in which the image can be divided.
Step 3: For each region in the image, a. Calculate the horizontal projection, . b. Use to calculate the number of sub-regions in which the region can be divided. c. If the number of sub-regions is one, i. Extract the dimensions of the region.
ii. Explore the next sub-region. d. If the number of sub-regions is greater than one, i. Repeat the above steps for the sub-region.
Step 4: Return the regions with their dimensions.

Post-Processing
In the projection and extraction of regions, certain characters are composed of multiple parts and thus split into multiple regions. To ensure precise character recognition and analysis,

Al-Askary and Al-Momen
Iraqi Journal of Science, 2023, Vol. 64, No. 6, pp: 3998-4013 4007 merging these regions is crucial in obtaining the complete character. Examples of such characters are =, ÷, ≡, ≥ and ≤. Consequently, addressing this issue is necessary for achieving accurate results.

Linking Characters with More Than One Pieces
During the segmentation stage, symbols like =, ÷, ≡, ≥ and ≤ can be divided into two or more pieces. For example, the symbol (=) can be split into two consecutive (−) symbols, as shown in Figure 7. It is important to reunite these segments to form a single entity. These symbols have a common attribute: they contain either a dash (−), a dot (⋅), or both. A classification system was created to determine which symbols should be fused together, which is based on several features such as the segment's width and height compared to the average values of the other symbols in the expression, the ratio of width to height of the segment, the area of the segment, and the distance between the centers of different segments. The average width and height were computed from the mainstream after removing any outlier values [32].   Figure 8 displays six sample images that have been skewed at varying angles with corresponding the skewness angle noted below to each image. These images were selected to evaluate the performance of the two proposed deskewing methods.

Experimental Results
Using the set of scanned images of mathematical expressions shown in Figure 8, the performance of skew correction was evaluated in terms of accuracy and speed by comparing the two proposed methods and the widely used Hough transformation method. The results are presented in Tables 1 and 2. Table 1 measures the error in detecting the actual skewness angle and Table 2 measures the processing time.   Samples of 20 images that were used to verify the effectiveness of the proposed segmentation method are displayed in Figure 9. The rate of correct detection is presented in Table 3: and the results of the segmentation process are shown in Figure 10.
Eq16 Eq17 Eq18 Eq19 Eq20 Figure 9: Sample images for evaluating the effectiveness of proposed segmentation method.  Eq16 Eq17 Eq18 Eq19 Eq20 Figure 10: Results of segmentation process for sample images using proposed method Table 4 presents a comparative performance analysis of the method introduced in this study alongside methods from previous research that is carried out by other scholars. The findings unequivocally indicate that the recommended approach outperforms the other techniques being considered in terms of efficiency.

CONCLUSIONS
The results of this study demonstrate that the proposed method is effective in segmented mathematical expressions. The proposed method involves three stages, namely preprocessing, segmentation, and post-processing. The deskewing stage plays an important role to make the proposed segmentation algorithm that gives a good result. Two deskewing methods proposed in this study, namely the iterative PCA and PCA-prediction and line fitting-correction. Table  1 indicates that the two proposed methods produce better deskewing outcomes compared to the well-known Hough transformation method. On the other hand, Table 2 demonstrates that the PCA-prediction and line fitting-correction method outperforms the proposed Iterative PCA method and the Hough transformation method in terms of speed. The proposed segmentation method can segment the mathematical expressions with an average detection rate of 97.32%.