Review on Hybrid Swarm Algorithms for Feature Selection

Feature selection represents one of the critical processes in machine learning (ML). The fundamental aim of the problem of feature selection is to maintain performance accuracy while reducing the dimension of feature selection. Different approaches were created for classifying the datasets. In a range of optimization problems, swarming techniques produced better outcomes. At the same time, hybrid algorithms have gotten a lot of attention recently when it comes to solving optimization problems. As a result, this study provides a thorough assessment of the literature on feature selection problems using hybrid swarm algorithms that have been developed over time (2018-2021). Lastly, when compared with current feature selection procedures, the majority of hybrid algorithms enhance classification accuracy


‫الهجينة‬ ‫ارزميات‬ ‫الخو‬ ‫عن‬ ‫اجعة‬ ‫مر‬ ‫ات‬ ‫المميز‬ ‫اختيار‬ ‫عملية‬ ‫في‬ ‫المستعملة‬ ‫عيسى‬ ‫صائب‬ ‫ابوبكر‬
Lately, data has increased in numerous industries, including blogs, social media, scientific research, business, and medical applications.This dataset has many features that might be extracted, yet not consistently; each feature is essential for extracting useful data from datasets.Some features could be redundant or unnecessary, lowering the model's performance.As a result, reducing such features is critical for improving model accuracy while minimizing the computational cost [1].
The paper is structured as follows: Section 2 goes over feature extraction and feature selection.Section 3 describes metaheuristic algorithms.Section 4 illustrates swarm intelligence.Hybrid swarm algorithms will be discussed in Section 5. Section 6 explains the classification.Section 7 shows a literature review.Section 8 will discuss comparative studies.Finally, there is a conclusion in Section 9.

Feature extraction and feature selection
Feature extraction uses two types of transformations, non-linear and linear, for transforming the original space of features into a new one with dimensionality reduction.In contrast, feature selection decreases the number of original features through the selection of a subset of the features that consist of the most relevant and important information for classification [2].Yet, feature selection has been considered a difficult task as a result of the large search space.Assuming there are x features, the total number of feature subsets is 2x, and that value dramatically increases compared with the number of original features [3]. Figure 1 depicts the four main processes in the feature selection algorithm.The general procedure for feature selection starts by generating a candidate feature subset for evaluation.Each candidate subset is evaluated by using an evaluation criterion to measure the quality of the selected features.The process of subset generation and evaluation is repeated until a predefined stopping criterion is satisfied.The feature selection process ends by submitting the selected subset of features to a validation procedure.The stopping criteria can be the number of iterations, the number of features, or the best classification accuracy, depending on the selected subset.Feature selection could be separated into three primary methods based on evaluation criteria: embedded, wrapper, and filter approaches.Wrapper approaches use a classification algorithm to locate subsets of features, whereas embedded approaches utilize a classification algorithm but select subsets throughout the classifier's training process [3].On the other hand, filter techniques do not require a classification algorithm to create subsets.Usually, the evaluation depends on the dataset's inherent properties.Table 1 depicts the categories of the feature selection algorithms.
Table 1: Categories of the feature selection algorithms [5]

Filters
Select features using independent approaches.A score, or an evaluation criterion, is used to choose the set of features by assessing the degree of relevance regarding each of the characteristics.

Wrappers
The wrappers represent feature selection approaches that use a learning algorithm for evaluating a subset of characteristics' classification performance.The evaluation is carried out with the help of a classifier, which evaluates the relevance of a subset of features.

Embedded
Wrapper and filter approaches are combined in embedded methods.Because the filter approaches were faster but not particularly effective, and the wrapper approaches were more effective yet computationally expensive, particularly with large datasets, a solution that included the benefits of both approaches has been required.

Hybrid
Multiple conjunctive primary feature selection methods are applied sequentially in this feature selection approach.

Metaheuristic Algorithms
Metaheuristic algorithms represent optimization-based approaches for finding the optimum (or near-optimum) solution to optimization problems.Those algorithms have the benefits of simplicity, flexibility, and the capacity to avoid the local optimum problem [6].
Exploitation and exploration are the two major steps of such algorithms [7].The algorithms extensively investigate the promising search space in the exploration phase, and the exploitation phase does a local search of the promising field(s) discovered in the exploration phase [1].Metaheuristic algorithms have two fundamental categories:

1-Single solution based metaheuristic algorithms
This algorithm begins its optimization process with a single solution, which is subsequently updated via iterations of optimization.However, the disadvantages of this algorithm are that it might enter local optima and doesn't search the entire search space.

2-Population (multiple) solution based metaheuristic algorithms:
This type of algorithm begins its optimization process with a set of solutions referred to as the "population," which are updated via optimization iterations to assist each other in getting out of the local optima problem and exploring the entire search space, making this sort of algorithm recommended for dealing with real-world problems [1].
As illustrated in Figure 2, metaheuristic algorithms might be classified into four groups depending on their behavior: swarm intelligence-based, evolution-based, physics-based, and human-related algorithms [8].Evolution-based algorithms: This algorithm mimics natural biological evolution by starting with a population of randomly generated solutions.After that, the best solutions are summed to generate new ones.Crossover, mutation, and selection of the best solution are used to create new solutions.The genetic algorithm (GA), which depends on Darwin's evolution approach, is the most widely utilized in this category [10].

Swarm intelligence-based algorithms (SI):
Those algorithms have been based on the social activities of animals, birds, creatures, etc. Particle Swarm Optimization (PSO), created by Eberhart and Kennedy, is the most widely utilized approach [11].
Physics-based algorithms: These have been inspired by physics rules in the universe.Examples of this type are simulated annealing [12], harmony search [13], etc.
Human behavior-related algorithms: Human behavior is the inspiration for such approaches.The researchers are motivated to create these algorithms by researching the actions of each individual that influence their performance.The Teaching learning-based optimization algorithm (TLBO) [14], League Championship algorithm [15], and others are among the most important.

Swarm intelligence
Each SI algorithm should follow a set of basic steps.As a result, the SI framework (Figure 3) is as follows:  Swam search is one of the most flexible metaheuristic searches utilized in feature selection, in which one of the swarm algorithms is combined with one of the classification techniques to formulate the feature selection process.For example, let the original set of features of size k, A={i1,i2,i3,ik}, The swarm algorithm (Swam Search) constructs a classifier with a classification error rate (e) and provides an optimal feature subset S such that e(A)≥e(S), where S is one subset of all the possible subsets that exist in the hyperspace of A.

Hybrid Swarm algorithms
Hybrid swarm algorithms are created by combining the best operators from other swarm algorithms or combining a swarm algorithm with another approach for producing the best subset of characteristics.When it comes to solving optimization problems, hybrid algorithms have recently gotten a lot of attention.Various hybrid swarm algorithms were created for the purpose of obtaining the most relevant and optimal feature subset from the original dataset, especially for the feature selection problem.There are many possibilities for improving an

Update and move agents
Return the global best solution

Yes
No algorithm that finds the optimal solution, even by modifying the algorithm to enhance the quality of solutions or by doing hybridization with other algorithms to overcome problems related to the concerned algorithm.The enhanced algorithms help to avoid local optima without the risk of early convergence, allowing for more effectively and efficiently exploring and exploiting the search space.Also, the improved algorithms accomplish the optimal or nearoptimal solution and strike a better balance between the exploitation and exploration qualities of the algorithm.A few algorithms were presented that combined the best aspects of several algorithms to produce a new one.

Classification
The process of developing a model that specifies data concepts or classes in order to determine the class of unknown-class-label objects is referred to as classification [17].There are two basic steps in classification: 1-Training: In order for the class label to stand out, the classification creates a model using the training data in this step.2-Testing: This step involves putting the model to the test by labeling data objects in the test dataset with class labels [18].Classifying a certain email into "non-spam" or "spam" or allocating a diagnosis to a patient depending on observable patient features (blood pressure, age, the absence or presence of specific symptoms, etc.) are two classification examples.The classification procedure is a supervised learning example.For instance, a training set of correctly identified observations is available [19].At the same time, clustering is an unsupervised procedure that involves categorizing data using some inherent similarity or distance measure.Regression-based classifiers, generative classifiers, and discriminative classifiers are the three categories of classifiers [19].KNN is one of the most popular classifiers in the past few years; however, there are various others such as decision trees, SVM, and Naive Bayes.

Literature Review
This section will look at a few works on classification systems that use a hybrid swarm algorithm to choose the best features among them (2018-2021): In 2018, Rajamohana and Umamaheswari [20] presented a study in which a hybrid method of enhanced binary particle swarm optimization and shuffled frog leaping algorithm is suggested for reducing the high dimensionality of the feature set and selecting the best feature set.After that, K Nearest Neighbor (KNN), Support Vector Machine (SVM), and Naïve Bayesian (NB) classifiers have been utilized to do spam review.Ott et al. created the dataset, including 1600 reviews of the 20 most significant Chicago hotels, with 20% utilized for testing and 80% utilized for training.The findings were compared with the binary PSO and SFLA algorithms, and all three classification approaches (NB, KNN, and SVM) were shown to be more accurate.
In 2019, Al-Tashi et al. [21] used the strength of the HGWO hybrid algorithm to present a new binary version of it referred to as the BGWOPSO.The study utilized 18 benchmark datasets from the UC Irvine Machine Learning Repository for classifying the system, using a KNN classifier compared to four state-of-the-art techniques (BPSO, bGWO2, WOASAT-2, and BGA).
In 2019, T. Keerthika and K. Premalatha [22] presented a new hybrid optimization algorithm called HFSBEE, which is the hybrid algorithm of two swarm algorithms (the fish algorithm and the artificial bee colony).The new algorithm addresses the shortcomings of the fish algorithm, which has a slower convergence speed and takes a long time to specify an optimal solution, as well as the ABC algorithm's disadvantage of increasing computational cost.The classification is achieved using the multi-kernel support vector machine approach, and the proposed classification approach is tested on three datasets from the UCI machine learning repository: Hungarian, Swiss, and Cleveland.The newly suggested approach outperforms the traditional fish optimization and ABC algorithms.
In 2020, Sundaramurthy S. and Jayavel [23] combine PSO with Grey Wolf Optimization to create an efficient Rheumatoid Arthritis (RA) disease prediction system.The method has two main phases.The researchers used the C4.5 classifier to predict the RA in the first stage, which relied on PSO to generate the initial population, and the second step relied on GWO for choosing the optimal subset of features.Also, the dataset utilized in this work was acquired from interested patients of Shakthi Rheumatology Unit's outpatient unit in Coimbatore.One thousand patients were evaluated for model prediction, with 375 male and 625 female patients.
In 2020, Sagban et al. [24] utilized the Binary BAT algorithm to optimize feature selection by benefiting from frequency tuning and automatic zooming of the algorithm.They then attempted to do classification using the Ant-Miner classifier with five folds, each using 20% of the test data.The dataset utilized was a cervical cancer dataset that was downloaded from the UCI repository and involved 858 patients.
In 2021, Thawkar S. [25] suggested a hybrid approach (TLBO-SSA) that is used to analyze mammograms of breast cancer patients in this research.The Salp swarm algorithm improves the efficiency and convergence of the basic learning-based optimization technique.This change merges the SSA's update approach with the TLBO's primary structure.The classifier employed is the adaptive neuro-fuzzy inference system, which has a precision of 98.46%.The Breast Cancer Wisconsin (WBC) Diagnostic Dataset was utilized.
In 2021, Abdel-Basset M et al. [26] presented a hybrid approach (HHOBSA) that combines the Harris Hawks algorithm with a simulated annealing algorithm for classifying various datasets, such as computer, biology, life, financial, statistical, and physical.While simulated annealing (SA) improves the HHOBSA algorithm's performance and helps avoid local optima, two bitwise operations (AND and OR) can randomly transfer the most useful properties from the optimal solution to the other solutions in the population to improve their quality.The KNN classifier is used to rate the solutions' quality.The highest classification obtained was 85%.
In 2021, Fahad et al. [27] suggested an approach (ACO-SU) that combined ACO and symmetric uncertainty.The suggested method assesses the utility of incoming features and discards those that are not required.The algorithm updates the acquired feature set when a new feature is identified.The new technique was tested using 14 medical image datasets from the UCI repository.Three classifiers are employed to evaluate the new approach's quality (JRip, J48, and the decision table).The average accuracy of classification was 72.69%.
In 2021, L. Meenachi and S. Ramakrishnan [28] hybridized the ant colony optimization algorithm with the local search algorithm Tabu search and a fuzzy rough set to predict cancer by selecting the best features from microarray gene expression data.The ant colony algorithm hybridized with the fuzzy rough set to find global optimal features.Following that union of local, Tabu search is hybridized with a fuzzy rough set to find local optimal features.The fuzzy rough nearest neighbor classifier compares the results with the typical ANT algorithm and Tabu search and achieves better results with all types of datasets and classifiers used.The medical datasets used are small-round blue-cell tumor (SRBCT), diffuse large B-cell lymphoma (DLBCL), breast and leukemia cancer datasets, and non-medical dataset swarm behavior from the UCI ML repository, which was used to demonstrate the generalization capabilities of the proposed algorithms.
In 2021, Dharmalingam and Kumar [29] hybridized multi-objective with the Tabu search algorithm to address the shortcomings of conventional single-objective PSO by generating a set of optimum solutions, which are then utilized to select the best features extracted from lung chest tomography (CT) images.In the classification level, the KNN classification approach with normal distribution and class probability is used.Lung CT images from Stanley Medical Hospital and other scan locations were included in the dataset.
In 2021, Adamu et al. [30] created an alternative wrapper-based feature selection approach by combining CSA and PSO algorithms.The former was altered by using a chaotic map to handle the diversity issue, and then enhanced CSA was combined with PSO to create the Enhanced Chaotic Crow Search Particle Swarm Optimization Algorithm (ECCSPSOA).To solve the local optimization problem, the KNN classifier and the opposition-based learning (OBL) local search method are used.The study made use of 15 well-known benchmark datasets from UCI data.
In 2021, Sathiyabhama et al. [31] hybridized the GWO algorithm with RST for selecting the best feature subsets for the highest classification accuracy using a rough set of positive regions and dependency functions (i.e., fitness functions).The authors used a J48, decision table, Naive Bayes, and IBK classifier, but their best classification performance was with a decision table using five datasets: GLCM 0 with 96.3 % classification accuracy, ISF with 96.4 % classification accuracy, GLCM 90 with 91.9 % classification accuracy, GLCM 45 with 93.3 % classification accuracy, and GLCM 135 with 94.1 % classification accuracy.
In 2021, Al-Wajih et al. [32] suggested a hybrid approach (HBGWOHHO) that combines the Harris Hawks and Binary Grey Wolf algorithms, and 18 standard UCI benchmark datasets have been utilized to validate the suggested approach.The quality of the selected features is assessed using a wrapper-based KNN.The Binary Grey Wolf Optimizer (BGWO), Binary Harris Hawks Optimizer (BHHO), Binary Particle Swarm Optimization (BPSO), Binary Genetic Algorithm (BGA), and Binary Hybrid BWOPSO were used to compare the performance of the suggested hybrid technique.Using the suggested approach, the average accuracy was 92%.
In 2021, for solving the problem of feature selection, Kitonyi and Segera [33] suggested an approach (HGDGWO) that combined the well-known meta-heuristic population-based optimizer, the Grey Wolf algorithm, with an iterative optimization gradient descent algorithm.In the classification stage, 6 medical data sets from the UCI machine learning repository were employed.Also, the accuracy of the suggested technique was tested five times using SVM and KNN classifiers, with certain datasets reaching 100% accuracy.Compared with several existing approaches (BGWOPSO, BGWO), the suggested approach could not attain maximum accuracy across all datasets.
In 2021, Chen et al. [34] suggested a BP-PSO approach that combines an adaptive PSO with a backpropagation NN to perform feature selection.To evaluate the proposed method's quality approach, a set of classifiers and ranking systems was applied (SVM-RFE, DTree, Wilcoxon test, RF, t-test).The datasets utilized for classifying cancer samples were taken from the TCGA database.The results indicate that BP-PSO has an average accuracy of 8.65% greater compared to the suboptimal NDFs model in various datasets and that its performance is 2.31-18.62%higher than the benchmark approach in all data sets.Table 2 shows a comparison of using the hybrid method with various swarm algorithms.

Discussion
The literature review that was introduced in the previous section showed that the hybrid swarm algorithms used for feature selection can enhance classification accuracy [23].[20][21][22][24][25], [29], and [31][32] achieved accuracy between [90%-99%] while the other research achieved less than 90% classification accuracy.Table 3 represents a comparative study of classification accuracy between various feature selection techniques and the hybrid method of different swarm algorithms for feature selection.

Conclusion
This research provided a thorough survey regarding hybrid SI-based feature selection algorithms.The major purpose of this study is to examine hybrid algorithms for feature selection tasks and the classification accuracy of hybrid algorithms in comparison with original techniques and state-of-the-art algorithms.From this survey, a collection of facts was concluded that can help in choosing the best hybrid approaches for the problem, as mentioned below: 1-Since feature selection is a binary problem, most algorithms employ binary versions when creating hybrid techniques.2-The hybrid approaches were utilized to solve the problems of controlling algorithmic exploitation and exploration.3-Most hybrid algorithms are employed to improve the fitness value, which improves classification accuracy and feature selection.4-Due to its strength and great capacity for hybridization, KNN is the most widely utilized approach for classification, and PSO is the most widely utilized algorithm for hybridization.

Figure 2 :
Figure 2: Classification of metaheuristic algorithms[9] 1. Initialization of the population 2. Definition of the stop condition 3. Evaluation of the fitness function 4. Updating and moving agents 5. Returning global best solution

Table 2 :
Comparison between using the hybrid method with different swarm algorithms