Adaptive Learning System of Ontology using Semantic Web to Mining Data from Distributed Heterogeneous Environment

Nowadays, the process of ontology learning for describing heterogeneous systems is an influential phenomenon to enhance the effectiveness of such systems using Social Network representation and Analysis (SNA). This paper presents a novel scenario for constructing adaptive architecture to develop community performance for heterogeneous communities as a case study. The crawling of the semantic webs is a new approach to create a huge data repository for classifying these communities. The architecture of the proposed system involves two cascading modules in achieving the ontology data, which is represented in Resource Description Framework (RDF) format. The proposed system improves the enhancement of these environments achieving both semantic web and SNA tools. Its contribution clearly appears on the community productions and skills developments. Python 3.9.0 platform was used for data pre-processing, feature extraction and clustering via Naïve Bayesian and support vector machine. Two case studies were conducted to test the accuracy rate of the proposed system. The accuracy rate for the case studies was (90.771%) and (90.1149 %) respectively, which is considered an affective precision when it is compared with the related scenario with the same data set.

A. Bohn et al. 2011 [8], proposed "Content-based SNA". It is a process that combines the two processes social network analysis and text extracting or Mining. They demonstrate how this mix will be used to represent people's preferences and figure out whether writers with shared goals interact through using R mailing lists R-help and R-devel. They discovered that as a person's centrality scores increase, the anticipated beneficial association among exchanging interests and communication becomes greater. P. Kazienko et al. in 2011 [9] provided a description of core analyses and approaches that are useful for developing organization structure and thus are centered on a social network approach. The concepts introduced in this framework are based on a social network that was created using physical production business information. Organizational social network research has been shown to be effective as a decision support method for operating a project. B. Hoppe 2010 [11] provided a base for postulating various components of organizational networks, as well as case studies to describe common results correlated between each type of network. The area of management formation faces a difficulty in assessment leadership networks. Under a broader framework, they approximate the nature of relationships including individuals, organizations, priorities, desires, and other institutions. A. Chin et al. 2007 [12] defined a tool for identifying bloggers' groups that combines a sense of group assessment and social network analysis (SNA). This concept was applied to a blog about European online music. They explore the essential principles of social networks created by blogger partnerships, as well as how several characteristics contribute to the respondents' (patients and practitioners of blogs) feeling of community. X. Shi 2007 [13] attempted to build database networks from website query logs, which shares similar queries and edges displaying semantic similarity within requests. Users' query histories were collected via query logs and then partitioned into query frames to create the network. They equate the built query networks to similar nonlinear systems and concluded that query connections are of low importance. A. Mislove et al. [14] presents a large-scale estimation research and interpretation of the function of several online communities. They examined data from Flickr, YouTube, Live Journal, and Orkut, four prominent online social networks. They notice that the user nodedegree appears to fit out-degree, that networks include an intricately intertwined kernel of moderate nodes, and that this kernel connects limited g. J. Golbeck 2007 [15] proposed the first inclusive survey of web-based social networks, followed by an analysis of membership and relationship dynamics within them. This approach presented several conclusions on how users behave on social networks, and what network features correlate with that behavior. D. Cai, Z. et al. 2005 [16] analysed the issue of mining secret societies for heterogeneous social media . Centered on the observation that various relations have differing degrees of value in relation to a given question. They suggested a new approach for learning the right joint distribution of certain associations to satisfy the user's needs. Best performance for group m can be accomplished with the received relationship.

3 Problem statement and main objective
Predicting and analysing data of real and heterogeneous environments (communities) for their continued survival, expansion of their activities and enhancement of their production is a complex problem. The main issue that faces the analysis and mining of this data is that these societies possess big data with different features and multiple entities. So, it is difficult to analyse, classify and then achieve these features in finding the significance of each. The main contribution of the proposed system is: a. Data processing and representation using resource data format. b. Training using processed data. c. Extracting the valuable data by mining the active or vital entities. d. Testing the proposed system with real data.

Theoretical background
This section introduces a weighty and significant theoretical background of proposed techniques for mining and classifying data embedded in social dataset.

Social network analysis Tools
The assessment of relationships between individuals, organizations, and other data processing artifacts is known as social network analysis [5]. According to researchers, it offers a method to review the interactions concerning individuals represented as binary or weighted adjacency cells [17], and SNA can be useful in predicting the ways in which entities cooperate with organizations [18]. Online social networks have expanded rapidly in recent years and remain among the most popular websites on the Internet. They act as a foundation for facilitating contact and identifying users of similar interests. Social network analysis [19] is the application of arithmetic and concepts to reflect the configuration of interpersonal relationships.

Social Networks Representation
A social system is established using a graph [20]. A graph model is a gradient for representing groups. A graph G = (V; E) is made up of a list of instances V, which are "vertices (or nodes)", and a sequence E of edges that connect pairs of vertices [21].

Centrality Measures Tools
In network analysis, there are basic indicators of significance that are commonly used [22]. Eigenvector centrality is the most intuitive indicator of a vertex's importance in a network. Given a graph G = (V; E), which is expressed by the Euclidean space A. Eq. 1 [23] describes how to calculate the degree centrality CD (vi) of a vertex vi belong to V: ( ) ( ) ∑ (1) In equation (2), a vertex's closeness centrality CC (vi)" is described as follows:

Where
.The theory behind eigenvector centrality is that if a vertex has a variety of central neighbours, it should be in the core of them. This metric is known as eigenvector centrality, and it is estimated using Eq. 4. [24]: Where Ni is the neighbourhood of the vertex vi, that implies Ax = λx (i.e., the Eigen vector of matrix).

Naïve Bayes Classifier
It is an expressive method and layout with distinct features [25]. "Parameter estimation uses the method of maximum likelihood". Figure 1 presents dropping phases of this technique, Eq. (5) presents the Bayesian relation: P (H|X) is a scientific hypothesis probability of H conditioned on X. The a priori likelihood of H, on the other hand, is P (H). P (X|H) is a scientific hypothesis expectation of X conditional on H. X has an a priori probability of P. (X).

SVM (Support Vector Machine)
"Various machine learning algorithms predicting and classifying data in accordance with the data The Support Vector Machine, or SVM, is a linear model that can be used to solve classification problems, where SVM algorithm creates a line or a hyperplane which separates the data into classes [26]".  The best line/ hyperplane separates this dataset. Let us consider a 2-D training tuple with attributes A 1 and A 2 as X = (x 1 , x 2 ), where x 1 and x 2 are values of attributes A 1 and A 2 , respectively. Equation of a plane in 2-D space can be written as shown in Eq. 6 [27]: 0 + + 2 2 0 [e. g. ax + by + c 0] (6) Where, w 0 , w 1 , and w 2 are some constants defining the slope and intercept of the line. Any point lying above such a hyperplane satisfies: Each spot under the hyperplane, as shown in Figure 3, satisfies equation (8) [28]: The goal of this approach is to maximize the margin. The hyperplane for which maximizing margin is the optimal hyperplane, as shown in Figure 3.

Ontologies for the Semantic Web
In the context of research on Anthropologists, ontology attempts to address the question, "What should be?" What are the characteristics that all beings share? [29,30].
Ontology plays an essential part in the semantic web by providing systematic descriptions of terms and relationships [31]. Figure 4, describe the current layout of an ontology learning framework. The ontology learning framework integrate ontology of specific domain with metadata repository. Pre-processing data, websites crawling, term extraction is the first step in  [19] ontology learning. This data will be filtered and clustered using Support Vector Machine. The method of deciding correspondences between two separate ontologies is known as ontology matching. Solving problems involving the integration and evolution of heterogeneous ontologies in Semantic Web applications [32] is a critical challenge. Ontology is a language that can be used to interact between a user and a machine, as well as provide individual opportunity to link the structure of contents [33]. Because of its ground-breaking features and connections with other areas, the Semantic Web area has riled the interest of researchers [34]. The establishment of a semantic domain ontology will aid in the reduction of common issues and confusions associated with logo systems [35]. Using expert trust and applicable judgement is a sign that the current ranking feature structure and domains can be improved [36]. Algorithm (1) depicts ontology learning for the proposed system.

1.
Select data of the selected domain.

3.
Preprocessing data for feature extraction.

4.
Extracting and Selecting specific features associated with metadata repository.

5.
Applying Web and blog crawling for the related target environment.

6.
Mining Concepts and indicating relational Entities with its cardinality.

8.
Executing Centrality Measures Tools for Social data.

12.
Check system performance using clustered data (formed as excel sheet report). Then feedback to the system critic.

14.
Train proposed System achieving modified data. Algorithm (1) presents ontology learning for the proposed system. It starts from achieving the available data in ontology database for the target domain (community), preprocessing this data, and removing redundant (Normalizing) terms. Features will be selected to execute centrality measurements tools for the social data, which is embedded in web and blogs of the target domain.
The two main techniques that are applied to classify the effective data are standard deviation measurement and Bayesian / SVM respectively. Testing system performance and modifying ontology repositories to train the proposed system will be the final step in this algorithm.

Methodology
The proposed work in this paper is based on adaptive learning and crawling technique. The detection of community in semantic web are Crawling and analysis of the semantic web. So, due to the special characteristics of this environment, the proposed algorithm utilized SNA tools to classify these items, which were previously depicted in equations [1][2][3][4][5] and SVM technique to classify communities to various and related clusters. Weka 3.8.4 is an opensource package used in implementing data pre-processing, features extraction, feature selection and clustering.

Proposed System
The proposed system is shown in Figure 5 consists of two cascading components .

Figure 5 -Block Diagram of the proposed System
The first is a repository covering a package of semantic webs, while the second component is a classification module to recognize communities utilizing SNA mathematical tools. The algorithm is depicted as a block diagram of the proposed system. First, the proposed algorithm begins with pruning those ideas, comments, and likes. Sniffing and evaluation of the semantic web is progressing.

Dataset
The first source of the proposed dataset was Oil Production Institute. It is a public dataset, containing a wide range of entities with heterogeneous relationships. Some of these entities have a direct relationship with each other. After that process, the construction of a social network is a clustering of actors and relations. Kaggle.com is the second source of the processed dataset. It was used in the second case study in this paper to evaluate system performance.

Web and Blog of Social Data
The following is an overview of the proposed system's resources: A. Web, blog, and forums that contain posts comments, votes, videos...etc.: be a data seed for the proposed research. Pruning those ideas, comments, and posts by suitable and related keywords. Then the crawling data become a seed of the SNA measures. B. A database containing a wide range of entities and heterogeneous relationships, some of them have direct relationships with each other while others do not. C. Use SVM or Bayesian to classify nodes or actors and their relation to provide common and related communities. D. Proceed to a filtration system that is characterized by the beneficiary desired data that was extracted via a report. System starts counting and measuring the strength of each node and their role in the system to be represented by a set of criteria set as shown in the table (1). E. Diagnose organization strengths and weaknesses with the assistance of the critic expert. Use the proposed machine learning agent and LS to diagnose objects (entities) from previous different views. For the problems of community, these different relation graphs can provide different communities.

Data Preparing with RDF Format
RDF format is more understandable and suitable with "visual conceptualization". The proposed system represents data using RDF Format. Data processed for (423) Instances (samples) have (15) attribute.

Figure7-Snapshot of Social Network data
As shown above the id represents nodes (entities) while links with http or verbs (concepts) are edges between these nodes. This sample data can be represented in table [3].

Political Vote Community
The second case study discussed in this paper as a clear and influential community for ontology learning is a political vote community. Various samples with (15) attribute were affecting the classification for this environment. Support vector machine (SVM) used for clustering and classification after training and testing data. The data of this community is free and available for use for different studies. Table (4) presents a classification of (28) samples for political vote community using support vector machine technique.

Training and Testing the data set
The dataset contained 1654 samples or records that belong to two labeled classes. The dataset was randomly split into two parts. Figure 8, depicts the snapshot of results for political vote community. The first part contains 80% of records (tuples) that were used as training models, while the second part contains the remaining 20% as a validation set used for testing the model's performance. Python 3.9.0 is an open-source software. It was used to implement the proposed algorithm from pre-processing and feature extraction, selection, and using support vector machine and Bayesian classifier for all the (423) instances.

Results and Discussions
The results depicted in this section were achieved after classifying different items in homogeneous environments via supervised learning approach with labelled samples for discriminating between two or more classes using SVM technique. Various items belonged to different clusters. The precision rate of the proposed system for the vote case study using SVM was 90.1149, as shown in Table 5, where "root mean square error" was (0.2977). While Table 6 depicts that the precision rate for the Iraqi Oil Production and Distribution community using Bayesian classifier was 90.771, incorrectly Classified Instances = 9.29, and Root Mean Square Error = 0.2724. Figure 9, presents snapshot of Results of Iraqi Oil Production and Distributed Enterprise.   The precision, recall, F-Measure, and ROC used to measure community performance via equations [9][10][11][12] respectively: .
( . ) (12) Figures 10 and 11, presents the evaluation performance of the proposed system for the data shown in the above case studies. These figures reflect the comparison according to (enhancement and efficiency) features before and after the proposed SNA analysis for three series.  These figures present the effectiveness of the proposed system via exploiting the centrality of the active entity. This leads to maximize the assurance and productivity of the target domain or community. TP-Rate, precision, and recall for the three series of the target community after adopting the proposed system will improve for adopting this approach.

Conclusions
This paper presents a novel scenario for constructing adaptive architecture to develop community performance using ontologies learning via SNA with two clustering algorithms. Support Vector machine and Naïve Bayesian Classifier are classification techniques used in this research for clustering the heterogeneous classes. The crawling of the semantic webs and blogs is an approach to classify communities. The proposed architecture involves two cascading parts or modules achieving ontology data. Graph is a main tool for representing ontologies using vertices and its edges. Moreover, contribution clearly arises on the community productions and skills developments. The proposed system affects evaluating and Radhi Iraqi Journal of Science, 2022, Vol. 63, No. 2, pp: 740-758 756 retaining relations and features enhancement. The accuracy rate of the proposed system when applying social network analysis tools was for vote community (90.1149 %), which presents an affective precision when compared with another scenario of the same problem. While the precision rate for the Iraqi Oil Production and Distribution community was (90.771) using Bayesian classifier with SNA tools.

Future work
Suggestions for future work for the proposed system can be achieved via the following proposals: a. Integrating the proposed framework with various social media platforms. b. Assigning an intellectual agent to recognize Arabic Social Networks (words or phrases) and mining valuable data from huge dataset. c. Adopting alternative classification machine learning technique. d. Optimizing accuracy using deep learning approach.