Community Detection in Modular Complex Networks Using an Improved Particle Swarm Optimization Algorithm

Community detection is an important and interesting topic for better understanding and analyzing complex network structures. Detecting hidden partitions in complex networks is proven to be an NP-hard problem that may not be accurately resolved using traditional methods. So it is solved using evolutionary computation methods and modeled in the literature as an optimization problem. In recent years, many researchers have directed their research efforts toward addressing the problem of community structure detection by developing different algorithms and making use of single-objective optimization methods. In this study, we have continued that research line by improving the Particle Swarm Optimization (PSO) algorithm using a local improvement operator to effectively discover community structure in the modular complex networks when employing the modularity density metric as a single-objective function. The framework of the proposed algorithm consists of three main steps: an initialization strategy, a movement strategy based on perturbation genetic operators, and an improvement operator. The key idea behind the improvement operator is to determine and reassign the complex network nodes that are located in the wrong communities if the majority of their topological links do not belong to their current communities, making it appear that these nodes belong to another community. The performance of the proposed algorithm has been tested and evaluated when applied to publicly-available modular complex networks generated using a flexible and simple benchmark generator. The experimental results showed the effectiveness of the suggested method in discovering community structure over modular networks of different complexities and sizes.


Introduction
In the last decade, analyzing complex networks witnessed great interest since many complex systems that are present today, like social networks, collaboration networks, metabolic networks, neural networks, technological networks, and also political election networks, could be embodied and modeled as complex networks. Mathematically, a graph is considered an efficient way, and is often used in practice, to represent a complex network, where generally the graph nodes correspond to the objects of the complex network and the graph edges correspond to the connections between these objects. A key feature of most, if not all, complex networks is the community structure; based on that, research work related to the discovery of the hidden complex community structure has received the attention of a large number of investigators and researchers from various scientific disciplines. Informally speaking, the main function of network clustering (or detecting the community structure in a complex network) is to divide the entities of a complex network into a number of groups based on two fundamental conditions, namely, that the connections or links between entities in one cluster are dense while the links between various clusters are sparse. This is called a cluster, module, or the well-known common term, community [1][2][3][4]. In sum, research based on network analysis and partitioning it into clusters is necessary to comprehend its organization and determine its functions. Different technologies have been customized and developed for clustering networks over the past decade; some recent surveys can be found in [5,6,7].
In most of the proposed studies, the problem of detecting community structure has been addressed as an optimization problem in terms of maximization or minimization of a specific objective function. The main purpose of developing optimization-based community detection algorithms is to try to find the optimal solution to the relevant problem, and this mainly depends on the adopted objective function formula and evolutionary operators. Oftentimes, optimizing this objective function is difficult and is known in the literature as an NP-hard problem. Therefore, numerous studies have been suggested based on adopting diverse metaheuristic algorithms like GA (Genetic Algorithm), MA (Memetic Algorithm), PSO (Particle Swarm Optimization) algorithm, and ACO (Ant Colony Optimization) algorithm [8][9][10][11][12]3].
Most community detection studies, such as those by Cao et al. and Liu et al. [13,14], were developed based on metaheuristic algorithms by optimizing the well-known single-objective function, namely, modularity ( ), that was introduced by Newman and Girvan in 2004 [15]. In 2007, Fortunato and Barthélemy [16] found that the community structure identified by the community detection methods based on optimizing the function is large and that these methods may be unsuccessful at detecting tiny community structures, which leads to the known Abduljabbar Iraqi Journal of Science, 2023, Vol. 64, No. 8, pp: 4228-4243 4230 resolution limit problem. To avoid this problem, Li et al. [17] designed a modularity density ( ) function, and for exploring the structure of complex network communities at different resolution levels, they developed a generic modularity density ( ) by adding a special parameter (λ) to the function. The general modularity density ( ) represents the sum of the average metric's degree of the hidden communities in the complex networks [18]. The singleobjective community detection methods have proven effective in addressing the problem in both synthetic and real-world complex networks, most of which are based on evolutionary algorithms [8,[18][19][20].
The particle swarm optimization (PSO) algorithm is another well-regarded metaheuristic method that was initially suggested to tackle single-objective continuous optimization problems [21,22]. PSO appoints a set of particles that explore the solutions by moving locally and globally in the search landscape to identify the optimal solution. The movement strategy of the employed particles is inspired by the movement mechanism of a bird swarm, where each particle saves its coordinate path in the search space and correlates with the best captured local and global solution (i.e., local and global optima) to the swarm [3,23,24]. To identify the best solution, the movement of each particle will direct toward the obtained local solution as well as the global solution. Thanks to PSO's effectiveness in resolving different continuous optimization problems [3].
In order to address the problem of community structure detection more efficiently using the PSO algorithm, in this paper we have developed a framework called IPSO-Net (Improved PSO for community detection in a modular complex network) that integrates a framework of the Particle Swarm Optimization (PSO) algorithm introduced in 2018 by Abdollahpouri et al. [25] with an improvement operator introduced in 2019 by Moradi and Parsa [8], which relies on identifying and resetting the complex network nodes that appear to have been mapped into the wrong communities. The key characteristics of employing the PSO algorithm compared to the other existing methods are that it has a quick convergence speed with uncomplicated implementation and, moreover, a large number of different variants [3]. The proposed method (IPSO-Net) has employed the general modularity density as a fitness function and perturbation operators in terms of crossover and mutation within the PSO framework to discover communities in networks. For assessing its performance, several systematic experiments have been done on modular networks with different sizes and complexities. The obtained results showed that the integration between the PSO algorithm and the improvement operator has a positive effect and significantly enhances the performance of the PSO algorithm in terms of convergence reliability.
The remaining sections of the paper are organized as follows: Section 2 provides a brief overview of the relevant proposed works. Section (3) presents a detailed description of the proposed PSO algorithm in terms of the adopted fitness function (i.e., general modularity density optimization model) and perturbation operators, as well as a detailed explication of the solution improvement operator. In Section (4), the settings of the experiments are presented in terms of the dataset used, standard evaluation metrics, the setting of PSO parameters, and the experimental results. Finally, Section 5 summarizes the research work and presents some conclusions and future work.

Literature Review
Recently, many research efforts have been made to adopt the PSO algorithm to capture hidden complex community structures in different types of networks [25][26][27][28][29][30]. Abdollahpouri et al. [25] suggested a novel method, called PSO-Net, for community detection based on a new version of the PSO framework. The proposed algorithm selected the modularity function as an objective function. Moreover, PSO-Net changed the particles' moving strategy by applying a crossover operator between each particle and its personal best location and the global best location over the whole swarm. After that, the 1-point neighbor mutation operator was applied to avoid falling into a local optimal situation. Experiments confirmed the effectiveness of the proposed algorithm PSO-Net in discovering communities over real and synthetic networks.
Cai et al. [26] proposed Q-PSO, a new algorithm based on the modularity function, to accurately and effectively detect community structure in several representative complex networks and synthetic benchmark (LFR) networks. Chen et al. [27] put forward a novel algorithm, P-PSO (particle swarm optimization based on the Physarum model), for detecting communities by combining the computational power of a type of slime called Physarum. The P-PSO algorithm improved the effectiveness of PSO by identifying the outer edges of communities based on a Physarum-inspired network model.
Cai et al. [28] proposed a greedy discrete PSO algorithm to detect community structure in large-complex social networks. The statuses of particles were redefined based on a discrete scenario; and based on network topology, the status update rules were reconsidered. In addition, a greedy strategy is introduced to guide particles into a promising area. Shi et al. [29] suggested a novel method based on PSO to discover complex community structures by applying the modularity model as an optimization function. Initially, an enhanced spectral method was employed to represent community detection as a cluster problem, and the weighted distance that combines eigenvectors and eigenvalues was developed to measure the difference between two nodes. Xiaodong et al. [30] proposed a new detection model based on PSO to discover complex web communities within the network without previous knowledge about domain information.
In order to effectively detect community structure in complex networks and guide the particles' movement towards optimal regions when employing the modularity density metric as an objective function, we have continued this line by improving the Particle Swarm Optimization (PSO) algorithm performance introduced by Abdollahpouri et al. [25] using a local improvement operator introduced in 2019 by Moradi and Parsa [8].

3.Material and Methods
In this section, the proposed IPSO-Net method is described in detail. The framework of the IPSO-Net method consists of three main steps: initialization strategy (i.e., particle structure representation scheme and fitness computation), movement strategy (i.e., search strategy), and an improvement operator that is developed based on identifying and resetting complex network nodes that seem to belong to other communities. The flowchart of the suggested PSO algorithm for determining the community structure in modular complex networks is shown in Figure 1. A detailed explanation of each of the above steps is provided in the next sub-section.

3.1Particle Structure Representation Scheme and Initialization Process
The proposed IPSO-Net algorithm utilizes the string encoding strategy as its representation scheme [31]. By using the string encoding strategy, network partitions are encoded as an integer string = { 1 , 2 , … , }, where indicates the number of network vertices, while denotes the integer cluster identifier for the vertex ( ), and its values range from 1 to . In order to accelerate the convergence of the proposed optimization algorithm, it was applied with a biased initialization and not with a fully random initialization. Practically, we have randomly chosen a vertex ( ) and assigned its cluster identifier ( ) to all of its neighbors [32]. This process was performed for each particle times when initializing the population with α set to 0.3 in this paper.

3.2Fitness Computation
In this study, the general modularity density ( ) has been employed as an objective function in IPSO-Net to obtain the community structure at a different resolution of the complex networks. Given an undirected complex network, = ( , ), where represents the vertices set and represents the edges (or connections) set. One way to represent a complex network ( ) is to define an adjacent binary matrix × , where denotes the number of the network vertices, such that element is equal to 1 when there is an existing connection (or an edge) between vertices and , otherwise element is equal 0. Let's assume that and are the vertices sets of sub-networks and , respectively, then ( , ) = ∑ ∈ , ∈ points to the number of edges between and , ( , ) = ∑ , ∈ points to the internal degree of , and ( , ̅ ) = ∑ ∈ , ∈ ̅ points to the external degree of wherein ̅ = − . Given sub-networks 1 ( 1 , 1 ), ……, ( , ) of a complex network ( ) provided by a particle ( ) , the objective function (general modularity density ) can be defined as: Where ( ) points to the variance between the average internal degree To identify the best global particle in the swarm, the particles are sorted in descending order based on their fitness value, and the particle with the highest fitness value is chosen to be the best global solution. Obtaining a high fitness value means detecting a high-quality community structure with dense connections in the complex network. To explore the complex network topology at different resolutions, the parameter λ is employed. If the parameter λ is equal to 0 then will tend to aggregate the network into large communities. While if λ is equal to 1, will tend to aggregate the network into small communities, and when λ is equal to 0.5, will perform equivalently to the modularity density function [17,18,33].

3.3Movement Strategy (Search Strategy)
PSO's search strategy depends on the mechanism of moving the particles towards their best local position while also moving them towards the best global position in the swarm. To guide the movement of each particle to the optimal possible positions, perturbation genetic operators (like crossover and mutation) are used. Below, the movement strategy steps of IPSO-Net are demonstrated in detail [3,25].

Moving towards the best personal (local) position
At first, a 2-point crossover operator is performed for each particle along with its best personal (local) position. Accordingly, two new solutions are obtained as a result of applying the crossover operator. Then, the obtained results are compared, and the particle (or solution) with the highest fitness value is chosen to be a temporary position of the present particle. Figure  2 illustrates two examples of a 2-point crossover operator. Figures 2 (a) and (b) show, respectively, two random solutions representing the parents 1 and 2 with their respective topological community structures.
In Figure 2(c), two arbitrary points, = 5 and = 7 are selected. After that, the 1 st child is produced by copying the cluster identifier from the beginning of parent 1 to point , the portion from to is cloned from parent 2 and the cluster identifier of the remaining set of nodes is cloned from parent 1 . while the second child is produced by doing the previous action in reverse order. Figure 2(d) shows the string encoded representation of the first child with its related graphical division.

Moving towards the best global position in the swarm
After each particle is moved towards its personal (local) best position, it will also move towards the best global position in the swarm. To achieve this, a 2-point crossover operator is performed between the particle's temporary position (that obtained from the previous subsection) and the best global position defined. In this respect, two new solutions are also obtained and compared together, such as the former crossover operator, in order to apply the mutation operator to the best selected particle.

Mutation
Lastly, the particles are mutated over the entire search landscape using the 1-point mutation operator. Under the predetermined probability of the mutation operator, a random node from the given particle ( ) is picked, and its cluster identifier is altered by a new possible cluster identifier of its adjoining nodes in order to ensure that only possible solutions are generated [3]. The output of the mutation operator for the particle is ′ which is compared with its personal best ( , ). If the fitness value of ( ′ ) outperforms the fitness value of ( , ) then , is substituted by ′ , otherwise, the , remains unchanged. When all the particles have moved and their personal best positions have been updated, fitness values are then computed again using the general modularity density measure, and the particle with the highest fitness value is chosen to be the best global position of the entire swarm. The above process is repeated until the predetermined number of iterations has been reached.
In order to enhance the proposed algorithm's performance and exploit the available knowledge about the problem, a local solution improvement operator is proposed whose main idea is to identify and reset the complex network nodes that seem to belong to other communities in case the majority of their topological links do not belong to their current communities. The details of the solution improvement operator are shown in the next section.

Solution Improvement Operator
When looking closely at the detected partitions for a given complex network, we can note that there are some nodes located in the wrong communities when the majority of their connections do not belong to their current communities and seem to belong to another community. Generally, a community within a network represents a set of closely linked nodes Abduljabbar Iraqi Journal of Science, 2023, Vol. 64, No. 8, pp: 4228-4243 4235 whose number of internal connections is greater than the number of their connections with the rest of the network's nodes in other communities. To this end, each node is assigned a computed corresponding value. This value is calculated for a given node ( ) by counting the number of its links whose targets do not belong to its current community. Accordingly, the nodes with relatively higher correspondence values are mutated into the new community. Moradi and Parsa [8] in 2019 proposed the above solution improvement method as a local search operator inside the genetic algorithm framework with locus encoding strategy representation. This local search operator decreases the inter-connections to discover high-quality clusters in the complex network. It is a very helpful operator that speeds up the population convergence and enhances the accuracy of the detected communities. Here, we have adopted Moradi and Parsa's local search operator as a solution improvement method inside the PSO algorithm framework with the string encoding strategy representation. The complete pseudo-code of the IPSO-Net algorithm for community detection is depicted in Algorithm 1, including the subprogram of the solution improvement operator.

Experimental Results
In this section, we have described in detail the settings of the experiments in terms of presenting the dataset used, standard evaluation metrics, setting PSO parameters, and discussing the results obtained from the test experiments.

4.1Dataset
This study has made use of the recently publicly available modular networks for validating community detection algorithms. These networks have been generated using a flexible and simple benchmark generator, called FARZ, introduced by Fagnan et al. in 2018 [34]. The FARZ model is similar to LFR, generating complex networks with built-in community structure that can be used as ground truth, which is ideal for validating the performance of community detection algorithms. FARZ generates dependable networks in the sense that it creates communities and networks that are characteristically similar to those of real-world networks. It is also composed of intuitive parameters with meaningful interpretation and is easy to tune for direct control of the generated networks' properties. There are 3 input parameters in FARZ ( ; ; ) which are respectively responsible for the determination of the number of nodes, the average degree, and the number of communities. There are also 4 intuitive control parameters in FARZ, , , , ∅ which are responsible for controlling the community structure strength, the clustering coefficient, the degree correlation, and the distribution of the community size. In the next sections, these networks will be referred to by their corresponding control parameter, as _ [34], since is the responsible parameter for controlling the strength of community structure. Accordingly, the values of in this study have been set over a range of 0.0 to 0.8 with a step of 0.05, in which diverse arbitrary complex networks of different sizes (from 50 to 250 nodes) were generated and modeled, and each network is composed of four communities. The parameters used when generating FARZ networks are listed in Table 1.

Config. parameters
The constant added to all community sizes. It is responsible for moving the community sizes distribution form heavy tail to uniform 1 ( ) The probability of noisy/random edges 1e-07

Overlap parameter
The maximum number of communities each node can belong to 1

4.2Evaluation Measures
In this paper, we have used both the modularity ( ) and normalized mutual information ( ) measures to assess the quality of the obtained complex community structures. NMI [35,36] is a criterion for measuring the similarity between the community structure resulting from the proposed algorithm and the real complex community structure of a given network. Let, = { 1 , … , } represent the real clusters of a given complex network, and = { 1 , … , } represent the obtained clusters by the proposed algorithm, wherein T and D denote, respectively, the number of communities present in the partitions . To calculate the NMI measure, first, we formed a confusion matrix =[ ], = 1,2, … , and = 1,2, … , , where represents the number of nodes that appear in the community ∈ and also in the community ∈ . Accordingly, ( , ) can be defined as : where , denote the sum of the elements of , over the row ( ) and the column( ), respectively. As mentioned earlier, represents the total number of nodes in the network. The value ranges from 0 to 1, i.e., when = 1 this indicates that are exactly equivalent, and when = 0 this means are totally different.
Modularity (Q) represents the most common internal quality measure proposed by Newman and Girvan in 2004 [15] and has been used basically to evaluate the predicted solutions when the real partitions are unknown. Q measures a fraction of edges that fall within communities, minus what is expected if the edges are randomly placed. It was observed that the Q value would approach a minimum value, i.e., 0, if the number of internal connections was similar to the random distribution. On the other hand, Q approaches the maximum value, i.e., 1, and deviates from the null case when all detected communities have dense intraconnections. This means that a network with strong community structures presents a high value. The modularity is defined as:

Parameter Setting
The IPSO-Net algorithm was implemented in Matlab R2016b. The experiments over modular networks have been performed on a computer having an Intel® Core™ i7 CPU @ 2.80 GHz and 16.0 GB (15.9 GB usable) of memory. In this paper, the number of iterations is customized to 100, the population size was set to 100, and the probability of a 1-point mutation operator is set to 0.3. In addition, we have investigated the impact of the parameter which varies from 0.3 to 0.7 with intervals of 0.2. All the experiments' results were reported by considering the average of 10 independent runs.

Experimental Results on Modular Networks and Discussions
To As community structure tends to be stronger with the increase of , increases, but it is still difficult to capture the correct partitions for community detection methods. However, IPSO-Net-1 could not find the correct partitions of the networks in all test cases. It should be noted that IPSO-Net-1 has obtained the best partitions when = 0.8 and λ = 0.7, where begins to increase with the further increase of . Given a specified , increases with the increase of λ. The reason is that IPSO-Net-1 with less than 0.7 tends to provide large communities, leading to a decrease in NMI. Since IPSO-Net-1 with λ equal to 0.7 can result in the best results, we set λ = 0.7 in this study.  Lastly, the results of the proposed algorithm at IPSO-Net-2 version when = 0.8 were compared with the PSO-Net algorithm of [25] that had employed the modularity function as a single-objective function. For the sake of fairness, the common parameters were set to the same value for both methods, i.e., the number of iterations was customized to 100 and the population size was set to 100. In addition, all the experiments' results were reported by considering the average of 10 independent runs. Table 2 shows the comparison in terms of and between the results obtained by both methods, IPSO-Net-2 and PSO-Net. The results in Table 2 have indicated that the proposed algorithm in the IPSO-Net-2 version significantly outperformed the counterpart algorithm of [25] (i.e., PSO-Net) in terms of and recorded satisfactory results in terms of . According to the related literature, both measures, and are considered the right metrics for evaluating a given solution. In fact, there is no strict positive correlation between these two metrics [37,38]. From the presented results, it can be concluded that the solution improvement operator, which is developed based on identifying and resetting the complex network nodes in cases where most of their topological links do not belong to their home communities, has enhanced the prediction power of the PSO algorithm.

Conclusion
In this paper, a particle swarm optimization algorithm with a solution improvement operator is proposed to capture the hidden community structure in modular complex networks using the modularity density metric as a fitness function. The solution improvement operator enhanced PSO performance by guiding the particles toward a better solution space. The key idea behind the improvement operator is to determine and reassign those nodes that are located in the wrong communities if the majority of their connections do not belong to their current communities, making it appear that these nodes belong to another community. The experimental results showed the proposed method's effectiveness over modular networks of different complexities and sizes. In the future, however, we will focus on enhancing the PSO's performance using an improvement operator based on strong or weak community concepts and adopt other known community detection models as a fitness function. Moreover, we plan to generate different networks and discuss the effect of other control parameters on the performance of the proposed algorithm.