Effect of Genetic Algorithm as a Feature Selection for Image Classification

Analysis of image content is important in the classification of images, identification, retrieval, and recognition processes. The medical image datasets for content-based medical image retrieval ( are large datasets that are limited by high computational costs and poor performance. The aim of the proposed method is to enhance this image retrieval and classification by using a genetic algorithm (GA) to choose the reduced features and dimensionality. This process was created in three stages. In the first stage, two algorithms are applied to extract the important features; the first algorithm is the Contrast Enhancement method and the second is a Discrete Cosine Transform algorithm. In the next stage, we used datasets of the medical images using GA-based feature selection to find feature vectors. Images from the datasets and images from the query are recognized using a correlation coefficient. The third stage of the proposed method used a diverse density algorithm feedback technique to enhance the performance of the . Images of breast cancer, brain cancer, lung cancer, thyroid cancer, etc., may be retrieved using the suggested procedure. By using a feature selection algorithm based on GA to determine the best subset of features, the challenge of system dimensionality is reduced. The suggested method has greater accuracy in precision, recall, and F-score than the other techniques.


Introduction
An image retrieval system algorithm was used to browse, search, and retrieve relevant images from large datasets.Previously, medical images were used for diagnosis and treatment.Physicians' research and clinical pathology are well supported by the medical sector.Many hospitals and medical centers produce large quantities of medical images daily.The analysis of these huge compilations of data is crucial for determining fundamental medical decisions and resolving difficult medical ambiguities [1].CBMIR is one of the most revolving research areas in medical imaging.The diagnosis history of the patient can be retrieved from a medical database to compare similarities in diagnosis methodologies.It is important to select relevant features that will produce a high recognition rate at a low cost when selecting features.High-dimensional features can increase recognition rates as well as system complexity [2]- [7].
We need an algorithm to extract features independent of other features; selecting the best features is important.The training phase must include a learning model for selecting effective features.This paper reduces the complexity of a highly complicated model using a genetic algorithm (GA).It is routinely used as a search heuristic to apply genetic algorithms to a search problem.In a GA [8]- [10], iterative optimization is used to optimize the solution population (feature vector).Typically, the iteration process, also called generation, begins with a random population of candidates.During each iteration, candidate solutions are evaluated based on an objective function.Therefore, new sets of the population of the candidate solution start with the objective solution for the next iteration.
Because of traditional approach issues, and for accessing the image, a specific method will depend on its content or feature [11], but it is also possible to achieve the same result using methods such as CBMIR [12].People need innovative technology that allows them to access images without extracting text from a huge quantity of images at any point on the internet.To efficiently feature images, the traditional text-based techniques will be replaced by texture features.Image feature libraries are used in the medical field to store the image features extracted via feature extraction techniques.Because of the importance of medical image retrieval in the treatment and diagnosis of human ailments, many doctors and researchers have focused on it.The rest of the paper is organized as follows: Section 2 is the Related Work.Section 3: Technical Approach.Section 4 provides the result and discussion, and finally, in Section 5, the conclusion and future works are discussed.

Related work
The methods that are related to the proposed method are discussed in this section.An essential aspect of the proposed method is the extraction of features and the optimization of feature selection.There are many methods that can be used for retrieving images and classification such as Fast Fourier Transforms (FFT), Fourier Spectral Binarizations (FSB), Histogram of Compressed Scattering Coefficients (HCSC), and so on.We described these methods below: In feature extraction, Ahmed et al. [13], used FFT to reduce selected convolutional features to bits.The pre-trained convolutional neural network mapped medical images with extremely reactive convolutional features.An optimal subset selection algorithm could be used to map neuronal responses.A density binary code is generated by applying the FSB to the global mean activations of this type of feature map.These transformations produce highly discriminatory hashes.Subash and Nagarajan [14] derive images from image features.A local curve pattern was derived from the image's line and curve features to efficiently indicate the image.
Li et al. [15] developed a hybrid method combining local and global features for color image retrieval.CILDP and BoVW are used.Local and global image features were captured using CILDP.
Varish and Pal [16] presented a new scheme for representing images based on GLCM.The first step is to estimate uniform-quantized histograms based on coefficients known as DC.Additionally, a DC feature vector was constructed using some statistical parameters obtained from this histogram.According to the GLCM of a residual image, specific statistical parameters are evaluated to create the GLCM-based feature vector.[17] extracted texture features from medical images based on LPDs and GLCMs.Combining (LMCoP) and (LVCoP) led to the introduction of (LMVCoP).To produce the LMCoP, the GLCM and LMeP were combined.(LVP), and GLCM makes LVCoP when integrated.These hybrid methods have given high results.

Jenitta and Samson
Zahid et al. [3] suggested a method for representing the image using the weighted mean of a triangular histogram.In this work, the image's spatial content is added to the BoVW's inverted indices.A reduction in over-fitting issues and a reduction in semantic gaps between higher and lower image features were achieved.
In optimal feature selection, Zhou et al. [18] optimized a method based on a dynamic strategy to achieve approximate optimization with low computational cost.
Li et al. [2] proposed an evolutionary algorithm based on dividing to select four objective features on a large scale.The dividing-based many objective evolutionary algorithm (DMEA-FS) searches for the optimal features based on four objectives: (1) identifying features; (2) correcting errors (3) intra-class distances; and (4) the distances within classes.They proposed two new structures, namely the wrapper structure and the filter structure, to achieve low computation costs and high accuracy.Initialization, evaluation, variable division, convergence optimization, diversity optimization, and decision-making constitute the six steps of DMEA-FS.These six steps include three new strategies for improving DMEA-FS's search performance, which include (a) two plans of action, including setting archives and reducing dimensions faster; (b) the use of dynamic weights.To classify related variables, a new classification method called mapping-based variable dividing is proposed.To determine the final solution, the Minimum Manhattan distance is used to introduce approximate triangleapproximating decision-making.
In [19], global decisions are made to obtain a good-quality fused image.Using OTLBO in [5]- [7], OTLBO with high fitness has become one of the most popular optimization methods.Medical images are retrieved using many algorithms.Different medical image retrieval systems represent medical images using traditional texture features.The development of LBP has been studied [20] and its improvements [21]- [24].This method involves developing several local encoding schemes to describe the local image contents from various perspectives.To retrieve medical images, HCSC features are proposed [25].All directional information considered is crucial to better performance.

Technical Approaches
Features selection for is performed using a GA, since the results of selecting the optimal features will reduce the dimensionality problems that can arise during the analysis of the data and measure the correlation coefficient between dataset images and query images.Based on diverse density (DD), an algorithmic approach is employed to enhance the performance of the proposed method.Images that are relevant and fit the theme are selected based on relevant feedback from a query image.Figure 1 illustrates the overall system design of the proposed method.Feedback explains diverse density-based relevance by extracting features and selecting features.

Feature Extraction
In feature extraction, the dominant features of an image are identified by analyzing medical images.This proposal uses texture features.This proposed algorithm uses contrast enhancement techniques and the discrete cosine transform algorithm to extract the features.

Contrast Enhancement Techniques (CE)
Some weaknesses can be shown in medical images, including low contrast and blurriness; therefore, these problems must be reduced in medical images by spreading the color values to their maximum possibilities.This study used global contrast stretching.

Global Contrast Stretching Technique
Global Contrast Stretching (GCS) is a collection of techniques designed to solve global problems, including poor lighting or excessive conditions in the source environments.It is possible to enhance an image based on the luminance information included in the entire image.The resolution and variation of an image are enhanced when it has high global contrast.Images with low contrast, on the other hand, are less detailed and have fewer details.The image pixel location (x,y) can be determined using Eq. ( 1).In Eq. ( 1), refers to the relation between maximum and minimum values, including blue, red, and green.In the technique, all ranges of colors are represented simultaneously to calculate component minimums and maximums as a result of combining the components.In this study, only one value will be used for each minimum and maximum value [26,27].

Discrete Cosine Transform (DCT)
Ahmed, Natarajan, and Rao introduced the DCT in 1974.The DCT technique is commonly used in image compression applications.By using DCT, dimensions can be reduced.As the coefficients are zigzagged scanned, we rank them based on decreasing importance, picking the high variance coefficients first.Image data can be decorrelated using DCT.Eq. ( 2), and Eq.(3) defines DCT.I(x,y) the gray image pixels of size N*M, and generates DCT(i,j) values.
( √ The coefficients with the highest variance are mostly located in the upper-left corner of the DCT matrix.Starting from the upper-left corner, the DCT coefficient matrix is scanned zigzag-wise and converted into a one-dimensional (1-D) vector [28].At the top-left corner of the block are the coefficients with the highest importance, as shown in Figure 2.

Feature Selection
It is necessary to use a better feature selection algorithm to maximize accuracy while minimizing computation time.Thus, the algorithm must remove redundant, irrelevant, and noisy features.In this paper, a GA optimal feature selection is used.For constructing a binary search tree, an algorithm will be used due to all features being represented by the roots, while the leaves represent subsets.The algorithm maintains a record of both the current criterion value and the best subset as it traverses the tree down to the leaves.
A population of binary strings is generated by the GA.Evaluate fitness and initialize the population.Evaluate fitness based on other features of the initial population.It is recommended to remove those features from the population if they don't satisfy the fitness function.The optimal feature set is formed by comparing all strings with the fitness function.Determine a new fitness function if none of the features satisfy the fitness function.Next, select the optimal value by continuing the search [9].
The iteration process, or generation, starts with a random population of candidates.In each iteration, the candidate solution will be evaluated by an objective function.In this way, new sets of populations of candidate solutions are generated based on the objective solution of the next iteration.A maximum number of generations is generated when the population reaches a satisfactory fitness level; otherwise, the algorithm terminates.Table 1 shows the GA parameters.Algorithm 1 shows GA steps [29][30][31].F-scores are calculated based on averaging the precision and recall rates (ARP, and ARR), respectively. (6)

Datasets
The experiments in this study were conducted with the aid of two publicly available CTimage datasets, such as (a) TCIA-CT [32] and (b) NEMA-CT [33].Then, we save each image with a dimension of 512 by 512 pixels.Table 2 summarizes the key points of each dataset [21].

Experiments on Dataset
We conducted retrieval tests for the two datasets cited above in this study.Both datasets use the same experimental setup.CT images are used as queries in the dataset, compared to other images in the dataset, and retrieved based on their similarity.We consider the top ten images with the highest similarity as retrieval outcomes.In Figures 3 and 4, we show the group example images from the TCIA-CT and NEMA-CT datasets, respectively.

Retrieval Result
Figures 5 and 6 show the results of retrieval from ( ) and ( ) datasets.It was shown that all the top 10 similar images for a query image were from the same category., , and score performance evaluations for these two experiments are presented below.As indicated in Table 3 the used proposed method obtained the higher in the , and is 0.99967, and 0.99910 respectively.

Discussion
A resize and enhancement step is performed on the input image during pre-processing.GCS is used to improve images.As a result, the contrast of the image is elevated.Additionally, it prevents overamplification of noise and extracts texture features from medical images.Then, the optimal features are selected using GA.Its effectiveness can be observed in its ability to reduce dimensionality and select features effectively.Medical images from the specified datasets are used to analyze the system's performance.ARP, ARR, and F-score are improved in the proposed method compared to the other existing methods, as shown in Table

Conclusion and Future Works
Along with retrieval images, this study developed a method for improving feature selection in CBMIR.A variety of images from medical image datasets are used as test images to test the implemented system.We propose a method for extracting visual features, high-level features, and low-level features.We implement GA to optimize the vectors of features, select the features, and classify images based on correlation coefficients.Two CT image databases are used for medical image retrieval experiments.According to the proposed method, the performance of each category is better, and the ARP (%) is higher than the other methods.In future studies, hybrid meta-heuristic methods will be used to classify another dataset of medical images and diagnose the images to assess how effective meta-heuristic methods are at diagnosing diseases.

Figure 1 :
Figure 1: The method diagram proposed

Figure 5 :
Figure 5: The retrieved images from dataset for a query ( Query image ( Top 10 retrieved images.

Table 1 :
The GA Parameters

Table 2 :
Image dataset summary

Table 3 :
Comparison of different techniques in the datasets TCIA-CT and NEMA-CT