Copy Move Forgery Detection Using Forensic Images

Digital images are open to several manipulations and dropped cost of compact cameras and mobile phones due to the robust image editing tools. Image credibility is therefore become doubtful, particularly where photos have power, for instance, news reports and insurance claims in a criminal court. Images forensic methods therefore measure the integrity of image by apply different highly technical methods established in literatures. The present work deals with copy move forgery images of Media Integration and Communication Center Forgery (MICC-F2000) dataset for detecting and revealing the areas that have been tampered portion in the image, the image is sectioned into non overlapping blocks using Simple liner iterative clustering (SLIC) method. Then, Scale invariant feature transform (SIFT) descriptor is applied on the grey of the handled image to gives distinctive key points that classified by K-Nearest neighbor to detect and localize the forged portion in the tempered image. The forgery detection results gave a performance percent of about 98%, which reflects the ability of the KNN classifier that cooperated with SIFT descriptor to detect the forged portions even if the forged area is rotated or scaled or both of them.


Sewan and Altaei
Iraqi Journal of Science, 2021, Vol. 62, No. 9, pp: 3167-3181 3170 image using the common formula (intensity, or luminance) [13]. Such that, the intensity grey image (I) is computed using the luminance formula from the three color bands as follows: (1)

SLIC Image Segmentation
Image segmentation is significant for digital image processing. In computer vision and image processing, image segmentation is significant. For image segmentation, there are many existing methods. However, it is difficult to make the segmentation results fit human experience. The definition of super pixels was suggested. Which super pixel is a perceptually significant atomic area. After this, the simple linear iterative clustering (SLIC) super pixel was formulated. It is advanced version of the super pixel and can be created in a very efficient way [14]. With a lower handling times and costs of storage, the SLIC algorithm achieves good quality segments than another approaches. That approach is very easy and have a one variable k , Which is the appropriate numeral of super pixels of equal size to generate [15]. This forms super through grouping pixel using a 5-dimensionals (labxy) space depending on their colors likeness and nearness in picture level, Where its lab Color is a more accurate color space. It uses three values (L, a, and b) to specify colors. The a-axis (green to red), b-axis (blue to yellow) and L is Lightness axis.That approach is very easy and have a one variable k , Which is the appropriate number of super pixels of equal size to generate [5]. SLIC takes a desired number of approximately equallysized super pixels Kslic as input. So each super pixels will have approximately Av=N/K average area of super pixel ,where N is number of pixels in the input image. Hence, for equally sized super pixels, there would be a super pixel center at every grid interval S= . Euclidean distances in CIELAB color space are meaningful for small distances [15]. Equations (2, 3, and 4) show how calculate spatial distance measure (Ds) as follows:

=116(Y/Yn))1/3-16 (3) a*=500[(X/Xn)1/3 -(Y/Yn)1/3] (4) b*=200[(Y/Yn)1/3 -(Z/Zn)1/3] (5)
Where Xn, Yn, Zn being the XY Z values of the white point. Auxiliary definitions are: √( ) ( ) (6) (7) Roughly, a* the maximum and minimum of value a correspond to red and green, while b* ranges from yellow to blue. Chroma is a scale of colorfulness, with more colorful (more saturated) colors occupying the outside of the CIELAB solid at each L brightness level, and more washed-out (de saturated) colors nearer the central achromatic axis. The hue angle expresses more or less what most people.Gradients for images are computed as follows: where I(x, y) is the lab vector corresponding to the pixel at position (x, y) and || || is act distance computing [15].

Scale Invariant Feature Transform
Scale invariant feature transform (SIFT) SIFT is a process for finding or extracting important features from the images .The feature should have two main requirements: The repetition in the original image should be avoided and dimensionality of the data must be reduced. That purpose of the SIFT is used to locate the key points (features) in various size areas and to measure the predominant direction of the key point. That key-points detected via SIFT were several notable key-points, like angles, corners, high points in the black region, vice versa, that are not affected via brightness, transformations, and distortion [16]. constructing scale space (octaves) measure L (x , y, σ), that was generated using conversion of Gaussian variables scales, G(x, y, σ), by use a source images, I(x, y): where * is the operations of convolution in (x, y) also , where σ is scale parameter

Sewan and Altaei
Iraqi Journal of Science, 2021, Vol. 62, No. 9, pp: 3167-3181 3171 To generate Different of Gaussian (DOG) 2nd order derivative use scale spaces extreme in DOG converted as well as the image , Gf(x, y, σ), to effectively detect stable key point positions in scale space, that could be determined as per comparison of 2 nearest scales, isolated by a static multiplication operator k ( ) ( )) ( ) ( ) Gƒ (11) Also, Sift is eliminate the Edge Response ,when D(X) less than 0.03 .To illustrate this orientation computation , a HOG is computed in the neighborhoods of the key points. With the same position and scale, it produces key points, but distinct directions. In the situation of the image test L(x, y,σ) given to scales , an direction ∅(x,y) and gradients size m(x,y) are pre-calculated by pixels variance using the In the given formulas [18]. In the given formulas: where,(x , y, , ƒ) In the given details ( x , y) describes image plane coordinate, represent scales , and ƒ contain the final descriptor.

Image Matching
Different parts are copied and moved to the same image during the copy move forgery process , so there is a robust correlation between these parts. This could be used for the detection of forgery as evidence. But identifying effective features and matching algorithms for identifying the associated regions is the main challenge. The matching of features is carried out to define the great similarities or matching between descriptors of features. If the similarities between the descriptors of the feature is found, it is interpreted as an indication for the duplicated regions [10]. Several method of identifying these similarities can be utilized like KNN method is supervise learning. . The first stage of K-NN is choose parameter (k), which is number of nearest neighbors. Next, calculation distance between the query (test sample) and all training samples by using equation: ( ) (14) Where, ED is the distance, Xi is training sample, and Yi is test sample .Later, must sorting distance and determine the nearest neighbors. final stage is apply simple majority to determine the predicate class [19,20].

Contribution
The motivation behind the present paper is to determine the original and copy place. Due to this matter was neglected by the previous literatures and was not touched upon, the present research focuses on this particular point. Also, the process of determining the copy move location in the image is more interest and requires a comprehensive study for the contents of the image. The contribution of the current work is the use of verification of the resulting image, and this step is more stringent to determine the places that have been manipulated by comparing the resulting image with Mask. The employed method will be compare and verify to existing state-of-art methods in terms of the effectiveness, robustness, matching time complexity, detection reliability, and forgery location accuracy, which is useful to verify the authenticity and integrity of digital images.

Proposed Forgery Image Detection (FID) Method
The general structure of the proposed forgery image detection is depicted in Figure-3, it contains two main stages: the first is the forged image detection (FID), which deals with the grey images that firstly goes to be segmented using SLIC method, and then applying the SIFT descriptor on the gray converted images to find the significant features. This prepare to use KNN classifier for matching features of multiple image segments and making a decision related to the existence of forged segment and its location. Algorithm (1) illustrate the main stages of the proposed method. The proposed FID stage includes multiple sequential steps within: first, the input colored image is segmented by SLIC method into non-uniform several image parts, then the segmented image is converted from RGB colored bands into one gray scaled image that input into SIFT feature extraction to extract the dominant features for each part in the image. These features are achieved and then used to compared with each other that belong to another image segments. KNN classifier is used to detect image segments that shows same image features to declare them as similar or matched .The next parts illustrate further detail about the sequent steps of the propose images FID stage.

Input
Color image Output D \\ Decision (forged or authentic image) Procedure

End
Step 1: Read color image.
Step 3 :gray conversion as Eq(1) // Convert the image to gray scale.
Step 6 : Matching by KNN as Eq(11)// Euclidean distance between only keypoint for two similar cluster is computed Step 7 :Decision // if no match key point then the image is authentic Else ,the image is forged. Step 8 : Investigation by compare output image with mask using Intersection Over Union // IF IOU >= 0.5 then good detection Else ,bad detection.

SLIC Image Segmentation
Simple Linear Iterative Clustering (SLIC) is an adjusted method of clustering by which images pixels are grouped into super pixels. With low processing time and memory expense, the SLIC algorithm provides better quality segments than other state-of-the-art methods. A single parameter (k) of similarly sized super pixels of (N*N) size is significantly present in the algorithm. Figure-4 provides an example of the SLIC super pixel segmentation image in which case (a) provides the forgery image, whereas case (b) displays the output of the SLIC super pixel segmentation method being applied.

Image Preprocessing
The pre -processing phase contain gray band computing and images spectral boost. The gray scale computing is depend on the belief that the three color gamut of images is described like a single gray bar, which reduces the efficiency of explaining pixel images from (24 bits) to (8 bits) per pixel, so the range of gray color intensity should range from 0-255 Values. The cause for recognizing such image from each another isolate of color image is that minimal information wants to be supplied for each pixel. In addition, grey scale image is quite enough for many jobs and thus there is no want to utilize much complicated and harder-to-process color image. The adopted method for converting the colored image into its grey scale, this is due to the SIFT applied only on gray images.

Features Extraction
SIFT method is used to extract many key-points from the images, which can be regarded as good image features for image description process. These features may be a piece of data that have relevancy for solving the computational task associated with description purpose. The most of the SIFT features are noticed aggregated with a specific structures within the image like points, edges or objects. Also, SIFT features are invariant to different factors and eminently special. For that, the probability of detecting a match between one feature to a data base of feature is highly possible. Figure-5 shows the block diagram of SIFT features extraction process.

6.4KNN Image Classification
Brute-Force (BF) matcher is straightforward. It gives the descriptors of one feature in first set and is matched with all another features in second set utilizing many distance computation, and the nigh one is returned. Whereas, BF-KNN uses the same idea of the BF for the match, but with return k best matches. Where, k is a number less than or equal the number of features in the feature vector. KNN is used to match the features vector of each image segment with the features set,

Image Segmentation Result
The medium resolution of material images used in the present work make the size of the resulted image segments is proper, the results of the image segmentation are shown in Figure- Original sample image (2) Tempered sample image (2) Original sample image (3) Tempered sample image (3) Figure 7-Sample images of MICC-f2000 dataset, images in upper row are authentic while images in lower row are its forged ones [3].

Sewan and Altaei
Iraqi Journal of Science, 2021, Vol. 62, No. 9, pp: 3167-3181 3175 segments and no information may found. Such segmentation results enable to locate objects in terms of lines, curves, and boundaries of the input color images. The threshold (T=0.4) is value determines the number of matching segments in advance and also the size of image based on several experiments, it was found to be the best value. It is found that the change in image resolution leads to decrease in accuracy of the segmentation results, and this effect may leads to change the image dimensions and angles that directly affects the process of features extraction by SIFT, which mainly depends on the angles of the shapes.

Result of pre -processing
The pre-processing step is apply to the incoming image to transform the twenty four bits spectral resolution incoming color sample image through an eight bits spectral resolution gray scale images. The above step allows the incoming image to be well analyzed by machine learning (ML) because to its direct effects on time of calculation and detection rate. The gray image conversion technique is applied and checked with the features extraction methods to assess that ability for extracting beneficial descriptors with the gray scale images. As seen in Figure-9, the weighted contribution of color of three bands (R,G,B) reported a high contrast results. It is obvious that the precise details of the resulting gray image are clearly seen, where the image is still reserved in the same sense and did not lose any of its details, which indicates its ability to be input the next image description step.

7.3
Feature Extraction Results SIFT descriptor is a particular approach used in 2 steps for extracting features from gray images : 1) detection of Interest point :-In the detection stage, the Hessian matrix is used to detect blob as structure on the integral images 2) Description of the interest points :-several interest points are observed at various scales; the number of interest point is proportional to the number of spectral variance in a particular area. Three test images and whose corresponding interest points are shown in Figure-10. Outcome of the detection stage were applied to the localization of interest points. Probably depends on the spectral distribution of images intensity, it is seen that the numbers of interest points per test varies from image to image. The highest peak in the histogram is taken and

Sewan and Altaei
Iraqi Journal of Science, 2021, Vol. 62, No. 9, pp: 3167-3181 3176 any peak above 80% of it is also considered to calculate the orientation. As a consequence, to every interest point, there are 64 features that are achieved. It indicates that for each image segment, n * 64 features are extracted. The localization of the points of interest was added to the outcomes of the detection stage. Based on the spectral distribution of that region, it is seen that the number of interest points per test varies from one image segment to another. The number of keypoints in a specific image segment is unlimited and depends on the number of resulted key points, which is differs from one image segment to another. Practically, these keypoints are stored in a two dimensional array represents the features array of that image segments in the database. It is noticeable that the key points of any image segments are found on the corners and edges of the objects found in that region, while there is no key points are shown in the empty regions; i.e., no objects in the region of interest.

Matching Result
The SIFT results refers to the results of the similarity matching of the proposed FID system. The increase of similarity fraction threshold (T) value of the match leads relatively to raise the number (N). The number of segments containing key point exceeds the threshold condition when it matches two segments and gives a true output in both the original and forged image parts, in which the mismatched is also shown connected incorrectly. On the contrary, when decrease the similarity threshold, the number of matched key points is relatively reduced, this associated with false connecting between dissimilar key points. Many tests have been implemented to check the accuracy of the proposed FID method, in which different threshold values and number of matches have been tried. The effect of different matching thresholds and number of matched key points is illustrated in Figure-11. The features vector of 128 dimension that belongs to either key points of a particular section in the test image is evaluated by the different values of similarity threshold and number of matches to verify the description ability of its features. Then it is aimed at reducing the features that lead mis-classification outcomes. Numerous experiments are performed in which the detection rate is calculated with an acceptable number of adjectives feature takes into account. The outcomes of the proposed FID are intended to be a detection region between two similar segments for just four match threshold values (T=0.4, 0.5, 0.6, and 0.7). The detected key points represents the most dominant features that show best discriminant behavior than others. It is found that the use of fixed threshold and fixed number of matches for all image parts does not always lead to acceptable results, this is due to the different texture is found in each segment. Thus, it is necessary to find out the best value of both T and N. The results show that the best FID performance is occurred when T=0.4 and N=6 are used. Such that, one can considered such values of T and N are useful for running the next KNN detection stage.

KNN Detection Results
The comparison process of any key point is done by comparing them to the nearest neighbors using KNN. The process of determining the value of the k is based on specifications of the used sample image. The increase of the value of k leads to an increase in a number of comparisons and also increasing the processing time, and vice versa. To make the balance between the consumed processing time and the number of required comparisons, several values of k have been considered to access the best one that gives the best detection results. Table-1 illustrates the effect of increasing k values on the processing time applied on this image. Thus, The circular neighbor region radius is needed to be modified with various value to enhanced best for behavioral classification results. It is clear that the accuracy of detection of the various run at which the circular coverage area radius (k) ranges from 1-7 is detected differ as per the value of the radius . Compared to others, the three pixel radius value provided the highest detection results. It is also shown that accuracy of detection is increased by increasing the radius value until the optimum one is reached when the radius is Three pixels, which brings a mean detection score of approximately 98 % when using 80% of the used dataset that randomly chosen to be contributed in the test performance measurement. Then this score had been fluctuated about same achieved level and with value of the radius rising .The explanation behind these activity is that the rise in radius allows more important information to be included within the region considered, which contributes to a rise in the detection level. The noticed disadvantage of more increasing k value leads to late the detection decision making the system to consumed more additional time. Thus, one can decide that the best value that gave acceptable detection results is k=3. The use of such k value make the required comparisons were performed with the convenient amount of the processing time. It is improbable to find the matched key points after the 3 rd neighbor, so any comparison after that may be considered a waste of time and without any usefulness.

Sewan and Altaei
Iraqi Journal of Science, 2021, Vol. 62, No. 9, pp: 3167-3181 3178 The matching between each two image segments is carried out by comparing the current segment with all remaining ones. When the key points of current segment is same as that of other segment, then one can consider them as similar to each other. In such case, the classifier refers to the closest two similar image segments as forged image parts. The numerical comparison used the distance measure between the two features vectors belong to the two image segments under consideration to determine the convergence between them. To investigate the true results, the location of the detected two forged portions are compared with mask associated with the handled image in the dataset. In case of matching the locations of the two detected forged portions with that of the mask, then the detection percent is 100% for that image, while the detection percent is 0% when the mentioned locations are do not identical. In case of identifying one location with the mask, then the detection percent is 50% only. Table-2 seen the gained detection score of KNN algorithm depended on SIFT descriptor. Figure-12 pictures the behavior of the KNN classifier given in Table-2 that indicates the forgery detection for ten randomly chosen forged images from the used dataset. These results showed that the mean forgery detection scores for KNN algorithm that based on computing means (µ) for detection score for ten runs of the proposed FID method was about 98.514% with a standard deviation (σ) of 2.013. In fact, those encouraging results show that the SIFT descriptor utilized acts positively mostly with classifier to obtain the better forgery detection , the descriptors are establish helping the classifier to reach the high detection rates .

FID Results Evaluation
The proposed algorithm has been tested using 248 images determined from the used dataset. The FID results evaluation is an important test based on using the remaining 20% of the used dataset that are not previously used in the test. Both TP & TN are calculated in this test to assess the performance of the proposed approach, leading to the predicted FP and FN errors that determine the accuracy of the process. These parameters prepare to compute the three performance measures: Prescion, Recall, and F-score. Figures-(13,14) shows Some original and tampered images used in the testing and the achieved values of the evaluation parameters given in Table-3, in which the average processing time for each test was about 4sec. Table 3-confusion matrix for two-class(authentic, forgery).

FID Results Investigation
In order to check the efficiency of the forged portion detection in the handled image, the image segments of highest matching probability have been considered while the smallest probabilities  were eliminated. This procedure of non-maximum suppression (NMS) is usually used for object detection, where the location of each image segment of accepted probability is compared with that determined in the original mask to find the amount of the overlapping between them. The measure that describes the amount of closing these regions is the intersection over union (IOU). IOU is actually used to measure the overlap between two images regions belong to different image references. IOU is computed by dividing the intersection of two images on the union of them. Such that, when there is no intersection between two images the IOU is zero, while when the intersection is totally cover them then the IOU is one. The partial intersection between them refers to the percent of identifying the location of the compared image regions. The results of this procedure are to present a box enclose the original and tampered regions in the target image as shown in Figure15. The application of the IOU on 100 tampered image sample containing only two similar regions within showed that there always was another identical portion in spite of little shift occurred in the location of the detected tampered region in comparison with its location in the mask image. In general, the overlap rate was about 70%, which indicates the assurance of existing a tampered region in the image in a shifted location. Several tests showed that the used method has the ability to detect the tampered regions in terms of its positions in the mask image. Less IOU percent refers does not underestimate the importance of such method, it is refers to existence a tempered portion bun not exactly fit its location in the mask image. The results of IOU pointed in Figure-(19-b and d) showed two green boxes in each image, each enclose the tampered and original portions in the image, which is refers to the effectively detection of both tampered and original portions in the image. It is usually of these results to possess IOU about 0.5 due to expanding the box to be greater than that found in the image mask. In such case, this method success to decide whether the image is forged or not, and also gave the approximate location of the tampered region. Many tests proved that this method was never wrong in detecting tampered, but it gave approximate results for tampered locations. Such that, the FID results achieved 100% forgery detection.

Conclusions
Throughout the implementation, it is concluded that the segmentation by SLIC method is proper when using medium resolution of material images, in which the use of proper number of segments (n) lead to make acceptable segmentation results. The threshold (T=0.4) value in the SLIC method determines the number of segments in advance and also the size of image. The change in image resolution leads to decrease the accuracy of the segmentation results, which affects the features extraction by SIFT descriptor. The accuracy of detection is improved by raise the value of the radius until the optimum one is reached when the radius is Three pixels, giving a mean detection value of approximation 98 %. The mean forgery detection value of KNN-classifier that based on computing means (µ) of detection score for ten runs of the proposed FID method was about 98.514% with a standard deviation (σ) of 2.013. The measured average processing time for FID implementation was about (3)(4) hours.