An Artificial Intelligence-based Proactive Network Forensic Framework

is at an all-time high in the modern period, and the majority of the population uses the Internet for all types of communication. It is great to be able to improvise like this. As a result of this trend, hackers have become increasingly focused on attacking the system/network in numerous ways. When a hacker commits a digital crime, it is examined in a reactive manner, which aids in the identification of the perpetrators. However, in the modern period, it is not expected to wait for an attack to occur. The user anticipates being able to predict a cyberattack before it causes damage to the system. This can be accomplished with the assistance of the proactive forensic framework presented in this study. The proposed system combines a reactive and proactive framework. The proactive part will use machine learning-based classification algorithms to forecast the attack. Once the assault has been predicted, the reactive element of the proposed framework is used to investigate who is attempting to initiate the attack. The suggested system further emphasizes integrity and confidentiality by proposing an encryption method that encrypts the proactive module's report before decrypting it in the reactive module. The suggested elliptical curve cryptography-based security model was compared to several existing security methods in this paper.A comparison of multiple machine learning-based categorization algorithms is also performed in order to determine which is the most suitable for the proposed Network Forensic Framework. Accuracy, recall, precision, and F1 value are the performance metrics used to evaluate the various machine learning-based algorithms. According to the analysis, the suggested Network Forensic Framework is best implemented using the Extreme Gradient Boosting (XGB) technique.


Introduction
The concept of network forensic operations works by capturing, recording, and analyzing a network suspected of being used for cyber vulnerabilities and investigations in a fashion that works to detect errors in the network and existing IT infrastructure and to go back to the attacker source to prosecute cybercriminals [1].Network forensics is a small part of digital forensics.Due to the rapid increase in internet connectivity, difficulties have been achieved by increasing the level of crime committed within networks, forcing law enforcement agencies and organizations to conduct special investigations.It is a process of capturing, recording, and analyzing events; identifying access to computer programs; and searching for evidence of such a thing.A skilled attacker can detect traffic flow on the forensic network, which requires expertise and resilience.The forensic network helps the investigator track the causes and effects of the attack with many challenges, such as time, speed, accuracy, storage location, performance, etc.The biggest challenge to network security is legal reliability; networks need ISSN: 0067-2904 Abirami and Palanikumar Iraqi Journal of Science, 2023, Vol. 64, No. 11, pp: 5896-5911 5897 to be configured, maintained, and updated [2].The purpose of network forensics is to provide sufficient evidence to allow perpetrators to be successfully prosecuted (e.g., effective hacking applications, fraud, data theft, software privacy, pornographic publication, etc.), taken from the movement of objects between computer devices, and to create evidence-based authentication records related to the planned motives for disrupting services or preventing data breaches [2][3].
After many years of research, network forensics looked at young science, in which many stories are still unknown.Network security protects systems, detects potential attack patterns, and monitors the network 24 hours a day, seven days a week.The forensic network can be started in real time as long as the necessary resources and infrastructure are available to manage traffic when it is analyzed [2][3].
The Network Investigation (NFI) process has two phases: online and offline.The online category includes retrieval, recording of network packets, and subsequent tests performed in the offline category, which are important data retrieval methods.Although a criminal investigation is essential, a framework will be followed.Therefore, the basic framework has three stages, which are preparation, investigation, and presentation [4].Cryptography comes from the Greek word for secret writing.By encrypting and securely encrypting, cryptography ensures a third-party secure account that protects data from theft and user authentication and explicitly transforms it into an encrypted form, and vice versa.Only designated users can view, access, and process it.It has two types, which are symmetric key and asymmetric key [5] and [6].
Machine learning is an important topic to talk about with more machines, such as training and implementing their programs with minimal human intervention.The automatic learning method is also updated based on machine functions during the process.Furthermore, equipment comes with reliable data, and many techniques are used to build ML models to train equipment based on data.For example, in standard applications, the input is selected from the data.In machine learning, data and output are provided as the input and output systemsare installed.In addition, machine learning systems read and monitor network data to test official and distinct ideas.However, there are still two obstacles to be identified: creating false alarm numbers and finding the source of the attack [7][8][9][10][11].
The main effect of this paper is to propose a network forensic model.Six machine learning-based algorithms are utilized to analyze and evaluate the network-based cyberattack.Six techniques are:decision tree (DT), K-nearest neighbors (KNN), gradient boosting machine (GBM), random forest classifier (RFC), extreme gradient boosting (XGB), and artificial neural networks (ANN).
The remainder of the paper is organized as follows: Section I discusses the literature review.Section II described the proposed framework for the forensic network as well as the flow process of active forensic network investigations, with an emphasis on the transmission of encrypted messages from one user to another.Phase III contains the role of machine learning in the forensic network, the various machine learning algorithms used to test novel lab setups, comparative analysis, and the existing forensic network database.Finally, we conclude the paper on Phase VI and provide various indications for future research.

Literature Survey
Due to the rapid growth of technology, the intruder enters with new and advanced techniques to create attacks.Therefore, it is important to develop a framework between methods, recording systems, saving and translating large amounts of real-time data, and communicating with management in accordance with the organization's policy.Network forensics has two types of investigations: reactive and proactive.The reactive investigation process begins after an incident has occurred to determine the cause of the attack [1].The biggest problem with the forensic network framework is that the process of investigation begins after the incident; it is very difficult to find the perfect source of the attack for further transmission to legal entities.A method is used to detect live site attacks by performing this practice with minimal human intervention.Some of the available reactive network frameworksare illustrated in Figure 1.Proactive investigation provides more reliability and accuracy in real time when an outbreak occurs.Early detection reduces the possibility of evidence distortion while increasing the processing of final heads, identifying patterns of attack proclivity, and keeping the evidence in real time [28].

Reactive Framework
Generic Investigation Framework [12] Abstract Digital Forensics [13] Integrated Digital Investigation Process [14] End-to-End Digital Investigation [15] Incident Response Methodology [16] Enhanced Integrated Digital Investigation Process [17] Event-Based Digital Forensic Investigation [18] Extended model of Cyber Crime Investigations [19] Hierarchical Framework for Digital Investigations [20] Modeling the network forensics behaviors [21] Framework for a Digital Forensic Investigation [22] Integrating Forensic Techniques [23] Digital forensics investigation framework that incorporate legal issues [24] Two-Dimensional Evidence Reliability Amplification Process Model Many proactive frameworks are being proposed by different authors, but their implementation is still pending.The proposed proactive frameworks are depicted in Figure 2. Grobleret al. [29]proposed a proactive network forensic framework, which is the multicomponent view of the Digital Forensics Framework.The components presented in the multicomponent view of the digital forensics framework are extremely challenging to apply in the various phases and obtain an efficient outcome.
Alharbi et al. [30] defined proactive and reactive functional processes.In comparison to the multi-component view of the digital forensic framework, the proactive and reactive functional processes are well defined, and components that can automate the output are designed, but the framework's problem has yet to be solved.Rahayuet al. [31] represented a mapping process in digital forensics.The redundancy in each component of the other proposed systems is reduced throughout the mapping process in digital forensics.Kaur et al. [32] projected the Network Forensic Process Model and Frameworkto get rid of unneeded phases.The processes are outlined in a more precise and detailed manner.The tools and techniques utilized in all steps of the generic process model for network forensics framework are not fully mentioned, making implementation extremely difficult.Barik et al. [33] proposedrestricting functionality in the Functional Process Model for Proactive and Reactive Digital Forensics Framework because it does not mention all antiforensic tactics; this is the proposed framework's flaw.Mohammad Rasmiet al. [34] proposed a New Cyber Crime Resolving Approach,which is also a proactive framework.The problem with this system is that the phases are not well articulated, making implementation impossible.

A Multi-component View of Digital Forensics[29]
Proactive and Reactive functional process [30] Mapping process in digital forensic [31] Generic Process Model for Network Forensics [32] Functional Process Model for Proactive and Reactive Digital Forensics [33] New Cyber Crime Resolving Approach [34] Abirami and Palanikumar Iraqi Journal of Science, 2023, Vol. 64, No. 11, pp: 5896-5911 5900 Reactive frameworks are ineffective and inefficient because they only react after the damage has occurred.The proactive frameworks defined are not implemented, and as mentioned, there are many flaws in the proactive framework.Artificial intelligence plays a vital role in making the proactive framework successful [35].Artificial intelligence can predict cyber-attacks by launching a cyber-attack model based on network packets collected [36].
An artificial intelligence-based framework is proposed that requires little user intervention and solves most problems by providing good training based on the model and the dataset [37].Various machine learning algorithms are incorporated into the framework to classify the network packets that are accumulated and captured during the live data transmission [38][39].
Kumar et al. [40] proposed anintrusion detection system using a decision tree algorithm.This intrusion detection system is trained to classify anomalies and misuse attacks.The intrusion detection systems available on the market are signature-based, which means they are not capable of finding unknown attacks.The decision tree-based intrusion detection system provides a better result compared to the signature-based detection system.Wazirali et al. [41] developed anintrusion detection system based on a semi-supervised learning method using a k-nearest neighbor machine learning algorithm.This method optimized the outcome by using cross-validation and hyperparameter logic to yield a high accuracy rate with a minimum false-positive rate.The result of the proposed method provides a good precision rate of 0.95 and a recall rate of 0.92.Verma et al. [42] proposed a network-based intrusion detection system with the help of the NSL KDD dataset.The classification algorithms utilized for his implementation are XGBoost and AdaBoost.Both the machine-learning-based algorithms yield better results as compared to the existing systems.Farnaazet al. [43] deal with intrusion detection systems using the Random Forest (RF) classification algorithm.The system categorizes attacks into four types: DOS, U2R, probe, and R2L.The author followed 10 cross-validations in the Random Forest algorithm.The feature selection methods are applied to remove the duplicated data in the dataset and the irrelevant attributes in the dataset.The dataset utilized in this approach is NSL KDD, like many of the authors'.As per the results achieved by the proposed system, this classifier generated better accuracy, detection rate, false alarm rate, and Mathew's correlation coefficient.Shenfield et al. [44]presented an intrusion detection system with an artificial neural network to detect malicious network packets.As per the results generated by the implemented system, the accuracy rate obtained by the system is quite good as compared to the other methods, and the false alarm rate is very low.This system has the capability to significantlyimprove the effectiveness of intrusion detection systems.
The proactive network forensic model needs a classifier to classify malicious and nonmalicious network packets.As per the survey, classification using Decision Tree (DT), K Nearest Neighbors (KNN), Gradient Boosting Machine (GBM), Random Forest Classifier (RFC), Extreme Gradient Boosting (XGB), and Artificial Neural Network (ANN) is providing a better result.

Proposed Network Forensic Framework
According to the study, the reactive form of network forensic inquiry will begin only after a cyberattack has been launched and the system has been damaged.The proactive network framework is more effective since it anticipates a cyber assault by gathering live packets and denying harmful packets access to the network.The suggested system is a network forensic architecture that combines proactive and reactive capabilities.Machine learning-based categorization methods will be used to predict harmful packets in the proactive part.The proactive component is responsible for detecting cyberattacks using live network traffic and conducting basic investigations.The proactive forensic report is forwarded to the reactive forensic section for further examination into the cyber assault.To ensure the report's integrity and confidentiality before it is delivered to the reactive component, it must first be encrypted.The proposed algorithm provides confidentiality and integrity.Confidentiality is provided by elliptical curve cryptography, and integrity is provided by the hashing methods and the digital signature.In terms of confidentiality, it has been demonstrated in [45][46] that ECC with Koblitz encoding improves security.The MD5 hashing function can be used in the proposed algorithm since it is one of the fastest hashing methods.The security gaps in this MD5 will be covered by the other modules of the proposed algorithm.The encoded message from Koblitz's encoder module is encrypted using the ECC algorithm, then a hash is generated using MD5, and the message is digitally signed to make it more secure.The reverse operation is done on the receiver side.The comparison of the existing security model with the proposed model is given in Table1.While it is in the basic course of action, incomplete information will be available for investigation, challenges with data integrity will exist, and it will be difficult toprove complete evidence to law enforcement authorities.To overcome those challenges, we propose a new framework shown in Figure 3.Our proposed framework implements an effective and efficient research process.In our proposed framework, network traffic is collected from a variety of sources, reduced to a minimum by removing unwanted data, and useful features are extracted from the processing unit.
The feature selection is done as per the requirements of the attributes considered in the dataset.The newly released pattern is consistent with existing matching patterns and behavioral differences compared to an existing knowledge base.If any match is found, the immediate response is due to the intruder informing us of the activity.The selection of input is done by processing input data collected online, which is collected from various sources.Finally, standard practice aims to combine alerts into a single format.The processing unit employs different machine learning concepts, pattern classifications, and knowledge bases.The processing unit is shown in Figure 4.
For inclusion in the research unit of the forensic research network for analysis, process data is provided, and a warning-based system is proposed.If any suspicious activity is no longer active, the user is notified via the default email program, and an initial report is generated.The initial report from the operating procedure is considered a contributor to the process of investigating practical research.When looking at the investment process, we propose a framework based on organizational approval.After obtaining approval from the relevant authorities, the investigation process begins.We also promoted secure communications using encryption mechanisms with an additional layer of security based on two-factor authentication while transmitting confidential information to the proactive network forensic analysis unit as input.Information contained in a confidential report should not be available to all employees of the organization.It should not be disturbed, or else credibility may be lost and it may be difficult to create evidence.Figure 5 displays the proposed network forensic process model.According to the report, further investigation was conducted, and a final confidential report was made.Accordingly, a decision is made.If there is a discrepancy, an option is available to re-investigate as per requirements.When transferring a report from one user to another, it must be in a secure, encrypted format so that unauthorized users cannot access it.

Artificial Intelligence in Network Forensic
Machine learning is considered to be the backbone of ethical intelligence, which comes from the field of artificial intelligence.Therefore, the adoption of machine learning in digital forensics was given a prominent place.There are various methods and algorithms used in machine learning for forensics analysis.There are seven steps to machine learning-based prediction, which are represented in Figure 6 [51].

Network ForensicInfrastructure
A lab setup for the purpose of the investigation is illustrated in Figure 7.The proposed infrastructure comprises the following four elements: traffic log collection;network packet feature selection;machine learning-based algorithms for prediction;evaluation parameters; and detailed analysis.

Traffic Log Collection
The Graylog server is used for capturing organization logs.Graylog is open source, which means freely available software.The logs are collected based on package, flow-based, and session-based detection.The proposed log collection method excludes key parameters from the log parameters and associates them with the pre-defined sixteen categories stored in the MySQL database.In the preprocessing stage, alerts captured by the Syslog server are considered input.Next, the Data Processing section aims to integrate alerts into a single format as organized and labeled alerts.

Feature Selection
The information gain mechanism is implemented to classify the dataset in use.The logs are captured based on 6 features.The traffic is classified into five different sections, as shown in Table 2.The 16 features of the KDD dataset on which the analysis is basedare illustrated in Figure 8 in the correlation matrix.From the 41 attributes of the KDD dataset, the selected 16 attributes contribute more to the classification of the network forensic model, which is proposed.

Machine Learning Algorithms
A study is made to choose the best classification algorithm for the proposed network forensic model.The machine learning algorithms for the study are described in Table 2. Decision Tree [52] The decision solution (DT) belongs to the supervised learning algorithm.Analyzes data in stages and creates a flowchart.The root illustrates an attribute that meets the primary role in the category, and the leaf classifies the class.

The k-Nearest Neighbours[53]
The closest k algorithm (KNN) is a supervised machine learning algorithm that can be used to work on both classification and regression problems.It is an algorithm for data classification that attempts to determine which data point group it belongs to by studying the surrounding data points.

Gradient Boosting Machine[54]
Boosting is the process of conversion into a strong signal.Gradient boosting leads many models in a slower, additive and sequential way.
GBM links prediction from various decision trees to make final predictions.

Random Forest Classifier[55]
Random Forest is not only flexible but also an easy-to-use machine learning algorithm.It can be used both as a classification and a regression algorithm.Random forest (RF) is a composite separator used to improve accuracy.The random forest contains many decision trees, has a low classification error rate, and is linked to different classification algorithms.

Extreme Gradient Boosting[56]
It employs a gradient descent algorithm to lower the loss when inserting new models.It is an application of gradient-boosted decision trees, which are created for velocity and performance.

Artificial Neural Network[57]
The artificial neural network (neural network) is a computational paradigm that stimulates the activity of nerve cells in the human brain.ANNs play an important role in machine learning (ML) and support the broad field of artificial intelligence (AI).Multilayer Perceptron (MLP) is a feed-forward artificial neural network model that bases the input data sets on a set of relevant results.

Performance Evaluation using Machine Learning Algorithms
Recall, precision, accuracy, and the F1-value are calculated to measure the performance of the different machine learning algorithms [58].When both the actual and predicted classes of a data point are 1, it is said to be True_Positive.True_Negatives occur when a data point's actual and predicted classes are both 0.False_Positive occurs when the actual class of a data item is 0 and the predicted class is 1.False_Negative occurs when the actual class of a data point is 1 and the predicted class is 0. The formula to calculate the parameter metrics and its descriptions are given in Table 3.

F1 Value
It calculates the precision and recall harmonic mean.
75% of the dataset has been used to train the data, and the remaining 25%is used to test the data.The percentage of 75:25 is taken to get better accuracy as per the dataset considered.The dataset has 7992 records that are taken for analysis, out of which 4302 (53.83%) are categorized as normal and 3690 (46.17%) are categorized as attacks.Experiments show that the Extreme Gradient Boosting (XGB) Classifier is the most successful at distinguishing between suspicious and normal network traffic on a given network.The analysis concluded that XGB is the most accurate algorithm of all the tested algorithms, with an accuracy of 98.23%.The Random Forest classifier (RFC) ranks second with an accuracy of 97.92%.The Gradient Boosting Machine classifier (GBM) is the thirdbest classifier with an accuracy of 97.05%.The decision tree (DT) algorithm produced an

Abirami and Palanikumar
Iraqi Journal of Science, 2023, Vol. 64, No. 11, pp: 5896-5911 5908 accuracy of 96.44% and ranked fourth.The K-Nearest Neighbors (KNN) achieved an accuracy of 92.61%.Lastly, the multi-layer perception classifier (MLP) showed results that were comparatively less accurate than the other six algorithms, with an accuracy of 88%.The detailed evaluation of the algorithms is shown in Table 3, and the comparison graph is represented in Figure 9

Conclusion
Because of current technology, cybercrime is on the rise, and all types of commerce, including education, are conducted over the Internet.As a result of this significant shift in the current period, hackers can now carry out a variety of attacks.Finding proof and predicting the hacker is pointless once the crime has been carried out and the system has been harmed.The anticipated system will be able to predict an assault before it occurs.Existing systems are forensic models that are reactive.A proactive forensic framework with a security layer is proposed in this study.A suggested ECC-based algorithm is used to make the security layer more secure, and a comparative analysis with the present system is used to show that the new security layer is stronger.The suggested security layer is made more secure by employing an ECC-based algorithm, and a comparative analysis with the present system is used to demonstrate that the proposed security layer is stronger.The system is made up of sections that are both reactive and proactive.With the use of machine learning-based classification algorithms, the initial assault packet can be predicted.A survey was conducted and a comparison analysis was performed between the Decision Tree (DT), K Nearest Neighbors (KNN), Gradient Boosting Machine (GBM), Random Forest Classifier (RFC), and Extreme Gradient Boosting Machine (EGBM) to determine the best machine learning-based algorithm (XGB).According to a comparison of performance parameter measures such as accuracy, precision, F1 Score, and recall, Extreme Gradient Boosting (XGB) produces a better outcome with an accuracy of 98.23%.

Figure 3 :
Figure 3: Proposed Network Forensic Model

Figure 4 :
Figure 4: The Processing Unit

Figure 5 :
Figure 5: Proposed Network Forensic Investigation Modelwith Proactive and reactive phases.

Figure 6 :
Figure 6: Machine Learning based Prediction System

Figure 7 :
Figure 7: Lab setup for the proposed model make network rescores unavailable to intended users.3probeAction taken or an object used for the purpose of learning something about the state of the network.4r2lTo gain unauthorized access to a victim machine.5 u2rFor illegally obtaining the root's privileges.

Figure 9 :
Figure 9: Comparative Analysis of Machine Learning Algorithms

Table 1 :
Comparison of proposed encryption model with the existing system

Table 2 :
Machine Learning Algorithms

Table 3 :
Performance Metrics and its Description