Adaptive Motion Compensated Spatial Temporal Filter of Colonoscopy Video

Colonoscopy is a popular procedure which is used to detect an abnormality. Early diagnosis can help to heal many patients. The purpose of this paper is removing/reducing some artifacts to improve the visual quality of colonoscopy videos to provide better information for physicians. This work complements a series of work consisting of three previously published papers. In this paper, optic flow is used for motion compensation, where a number of consecutive images are registered to integrate some information to create a new image that has/reveals more information than the original one. Colon images were classified into informative and noninformative images by using a deep neural network. Then, two different strategies were used to treat informative and noninformative images. Informative images were treated by using Lucas Kanade with an adaptive temporal mean/median filter, whereas noninformative images were treated by using Lucas Kanade with a derivative of Gaussian (LKDOG) and adaptive temporal median images. Comparison showed that this work achieved better results than those achieved by the state-of-the-art strategies for the same degraded colon images data set. The new proposed algorithm reduced the error alignment by a factor of about 0.3, with a 100% successful image alignment ratio. In conclusion, this algorithm achieved better results than the state-of-the-art approaches in case of enhancing the informative images as shown in the results section; also, it helped to reveal some information from noninformative images that have very few details/no details.


Introduction
A series of our work, consisting of three previously published papers, described denoising different types of distorted colon images. To the best of our knowledge, we are the first who have conducted these kinds of experiments with colon images in a series of work. The main goal is to improve the visual quality of colonoscopy videos to provide better information for physicians. This work was the first that was able to remove very large areas of specular highlight from colon images by dealing with informative and non-informative images separately. Hence, a comparison was made between the previous and the current works by using error alignment and other metrics which are explained in detail in previous reports [1][2][3]. Some artifacts in the colon images are caused by the light reflected from the colon device. To clarify, light from the endoscope device is sometimes reflected directly to the camera because of the wet surface in the colon, causing bright white patches called specular highlights to appear in the image [1,4]. In some cases, the specular highlight dominates the whole image, and nothing is visible. In other cases, the specular highlights are relatively small, and they move around in the image as the camera is moved. In this case, specular highlights may be removed by image processing techniques that combine information from adjacent images in the colonoscopy video. Traditional image restoration techniques such as median filtering or Gaussian smoothing are not effective for removing specular highlights in colon images because each specular highlight includes a region in the input image and not individual pixels. Depending on the orientation of the patient's colon relative to the light source, these regions can also vary significantly in size, and in some cases, the specular highlight may dominate almost the whole image. Our objective is to enhance the visual quality of colonoscopy images by removing the specular highlights. To do this, we integrate information from adjacent images in the video sequence to both detect the specular highlight and replace these incorrect pixel values with the correct pixel values from adjacent images. In this paper, a new motion compensation-based spatial temporal filter is proposed to enhance the quality of colonoscopy images. In this approach, specular highlights can be removed to be able to see visually important image features even in highly distorted images. The rest of this paper is organized as follows. A summary of previous work is presented in section 2. In section 3, a description of the current approach that enhances the quality of colonoscopy images is explained. A description of the proposed approach is provided in section 4. The valuation metric is overviewed in section 5, while section 6 describes the implementation and experimental results. Finally, section 7 contains conclusions and some suggestions for future work.

2.
Literature Review 2.1 Specular Highlight Specular and diffuse reflections are the two kinds of light reflection. Chromaticity and noise analysis can be used to separate the reflection components of any kind of light direction and surface roughness [5]. The reflection of incident light in a single direction generates specular reflection. Specular reflections cause many difficulties with computer vision tasks such as image segmentation and object detection and matching [1][2][3]6]. Many methods have been proposed to remove specular highlights from images. Some researchers have used a single image for this purpose [5]. Other authors have used multiple images [1,2,[6][7][8][9]. Many other methods regarding specular and highlight problems can be found in a previously published literature survey [4]. In our application, we have used multiple images of the colon surface that are captured from different views and suffer from specular highlights which move gradually from frame-toframe in the colonoscopy video. We used this information to locate and remove small specular highlights from colonoscopy images. In our earlier approaches [1], artificial intelligence was used to classify images to informative and non-informative. In [2,3], some methods were used to treat and align colon images by using the classification results from our first study. RANSAC with LKDOG was used to treat the specular highlight in informative and non-informative images, respectively. In [2], the noninformative images were excluded because RANSAC was not able to align them, but in [3], a newly proposed approach was used to treat all images, even those with high distortion. The results showed that LKDOG helped to remove large specular highlight areas in the noninformative images and converted them to informative images. However, LKDOG did not help a lot in the case of informative images that suffer from an individual or small area of specular highlights. In some cases where there are several consecutive distorted images which are difficult to align because of the lack of overlapped information, this causes some damages to the original image after alignment. Hence, the new algorithm was proposed to deal with the informative and non-informative differently. The proposed algorithm uses LK for informative image and LKDOG in case the image is noninformative. In 2020, an approach was proposed [10] to use principal component analysis to obtain the sparse parameters which can be used to remove the specular highlight in an endoscopic image.

Optic Flow
One of the main fields in computer vision is motion analysis. It is the key element to interpret the observed phenomena in an image sequence as a combination of object motion and/or camera movement. Motion estimation results can be used in many applications such as image alignment, robot navigation, object tracking, quantifying deformations, retrieving dominant motion, detecting abnormal behavior, and many others. Estimation of a dense motion field, which is called optic flow, is the low-level characterization. Most high-level tasks use the optic flow estimation information to build on it to achieve a particular goal [11][12]. Differential methods that utilize the most widely used techniques for optic flow computations in image sequences were previously described [13][14][15][16]. With the assumption that image intensity remains constant during object/camera motion, the fundamental optic flow constraint was derived that describes the optic flow in terms of spatial temporal derivatives in an image sequence. This yields one equation with two unknowns describing the motion at each pixel location. To solve this under-constrained problem, Horn and Schunck used variational methods that imposed global smoothness constraints on the optic flow motion vectors [14]. On the other hand, Lucas and Kanade (LK) proposed an approach that used least mean squares to solve for the motion field using information from a small neighborhood about each pixel location [13].

Azawi
Iraqi Journal of Science, 2021, Vol. 62, No. 11, pp: 4148-4157 4151 Recent comparisons of optic flow techniques found that the LK approach was robust under noise and achieved a smaller error average among many optic flow methods that have been tested [15]. Another recent comparison of the optic flow technique studied the effect of noise on the performance of Horn and Schunck and LK algorithms. LK shows greater resistance against noise than Horn and Schunck. The latter showed sharper motion boundaries than LK. Based on this analysis, the authors applied the Horn and Schunck algorithm with a course to fine optic flow to measure the static deformation of a birdlike flexible airfoil [17]. Different approaches were proposed to enhance optic flow estimation. For example, Sharmin suggested smoothing all the resized images in LK pyramid and the results showed that the performance was better than smoothing only the first image in the pyramid [16]. LK was used widely in image alignment and the authors used feature-based LK along with the active appearance models for face alignment [18]. The results showed that using warping image features (HOG and sift) at each iteration is better than extracting features after warping. An algorithm was described by other studies [19,20] that helped to speed up Locus Kanade and most optic flow algorithms while preserving their accuracy. In one of these studies [18], mutual information (MI) with Lucas Kanade was used to speed up the performance (15% improvement was achieved). Other works [21,22] were performed to improve optic flow accuracy. One proposed algorithm [21] suggested adding overfine interpolated levels to the pyramid to improve coarse to fine optical flow accuracy. Their approach reduced the error by 10-30%. LK with derivatives of gaussian method (LKDOG) was proposed [23]. The authors employed the gaussian and derivative of gaussian to calculate Ix, Iy, and It.

Methods
Colonoscopy images suffer from shiny and large specular highlight amounts because of the wet surface of the colon. Therefore, there are a lot of outliers and noise in these images. The goal of our approach is to process colonoscopy images to identify and remove unwanted specular highlights from the colon images. To do this, we perform motion compensated spatial temporal filtering. In this paper, the algorithm has two main phases which are explained in the following two sections (3.1 and 3.2).

Motion Estimation
The fundamental optic flow constraint for Lucas Kanade and LKDOG method is as follows [5]: Ix(x,y,t)u + Iy(x,y,t)v + It(x,y,t) = 0 (1) The spatial-temporal image intensity derivatives are Ix(x,y,t), Iy(x,y,t) and It(x,y,t), respectively. The horizontal and vertical components of optic flow are represented by u and v. Since the fundamental optic flow constraint is represented by only one equation and two unknowns, the LK method considers a local window around each pixel in the image and uses the least mean square solution to this system of equations. min∑i (Ix (xi,yi,ti)u+Iy (xi,yi,ti)v+It (xi,yi,ti)) The objective function that was used to achieve the best fit that minimizes the error is as follow, see [11,22] for more details: The object motion for each pixel (x,y,t) in the input image sequence was determined first. Then a 4D motion field was used to calculate the motion vector around each pixel using the following recursive formulas which use the motion field after n-1 frames and the single frame motions Mx and My at the corresponding (x,y,t) location at frame n-1 to calculate the motion field after n frames.

Spatial Temporal Filter
Temporal filtering algorithm looks at the same (x,y) location in a sequence of N aligned consecutive frames to see what this time series of pixel values contains. In this case, we would expect pixel' values to gradually increase or decrease in brightness, as shown in Figure  1. If we look at a time series of pixel intensity values at location (x,y) in the neighborhood of the noise or artifacts, we will see pixels suddenly become brighter or darker for a few frames and then return to the original intensity value.

Adaptive Spatial Temporal Filter
Global threshold and standard deviation were used with the adaptive spatial filter. The global threshold was calculated by finding the mean value for each pixel in the consecutive sequence. Then, the standard deviation S for each pixel in the consecutive sequence was calculated. After that, a comparison was made between the global threshold and S to check if there is a high or low variation between the two standard deviations. Based on the comparison result, the algorithm decides to implement either the temporal median or temporal mean.

The Proposed Approach Algorithm
In the previous work, a median temporal filter with LKDOG was used [3]. This method helped to remove large areas of specular highlight that exist in some noninformative images, but it did not help a lot in the case of informative images that have some small areas of specular highlights or some specular highlight pixels. The algorithm in this paper has two phases. The first phase is used to estimate motion from a sequence of images. The second phase is used to perform an adaptive spatial temporal filter for the successfully aligned images that came from the first phase. Two different optic flow methods, instead of one, were used in this study. First, the images were classified into informative and noninformative images using neural networks. The performances of some machine learning algorithms were tested for classification. Some of Azawi Iraqi Journal of Science, 2021, Vol. 62, No. 11, pp: 4148-4157 4153 these methods include random forest, closest centroid, backpropagation neural network, and deep neural network. The accuracy range was 92%-98%. We have chosen the deep neural network result because it achieved the highest accuracy. Analysis details can be found elsewhere [1]. Then, the algorithm uses LK method with the informative image and the LK derivatives of gaussian optic flow method (LKDOG) with the noninformative image. Also, the new enhanced proposed approach in this paper uses adaptive temporal instead of only temporal median as in the previous work. The adaptive temporal filter in the current article uses either mean or median when dealing with non-informative images. The decision of choosing to implement temporal mean or temporal median was made based on the comparison between the standard deviation for each pixel and the global threshold for the input color image. The traditional median can remove some individual outlier, but the adaptive temporal filter can treat large/small areas of an outlier in our case is the specular highlights by implementing spatial temporal median or spatial temporal mean filters depending on the comparison between the spatial temporal standard deviation and the global threshold. If the spatial standard deviation is high, which means a lot of outliers, then the spatial temporal median is applied; otherwise, the spatial temporal mean is applied. The proposed solution uses LK and LKDOG. Then it uses the estimated motion of either LKDOG or LK as an input to the adaptive temporal filter. To clarify, the input colon images are classified into informative and non-informative images. Then, the proposed algorithm checks the label value for each image, whether it is informative or not, and then processes each type of classified images differently/ separately. The 4D spatial temporal filter for both cases is used. After that, LK optic flow is applied to the informative images by considering calculating the pairwise accumulated motion for each pixel after each estimated motion. Also, the proposed method calculates the standard deviation for each pixel, which represents the standard deviation for the 4D spatial temporal filter, and compares that with the global thresholds for the processed image. Then, the temporal median is applied if STDV is greater than the global threshold; otherwise it applies the temporal mean filter. In case the processed image is noninformative, the proposed algorithm does the same procedure, except using LKDOG instead of using LK optic flow. Then, the algorithm implements temporal median if STDV(x,y) is greater than the global threshold for the processed image. Finally, the processed images are stored to be an enhanced version of the original colon video.

Evaluation Metrics
The evaluation metrics that are used in this work are the same as those used in the previously published work [1][2][3]. Two kinds of evaluation metrics were used, which are the objective and subjective metrics. The first objective evaluation metric uses the mean absolute value to calculate the motion displacement in the x and y directions for n consecutive images; we expect to see that the mean absolute value after alignment is lower than that before alignment. The second objective evaluation metric calculates the percentage of successfully aligned images where: Percentage= #successfully aligned images / total number of images (6) The visual quality of the images is also considered after they are aligned using the proposed method in this paper.

Implementation and experimental results
Unlike our previous work [3], the new proposed approach used LK instead of LKDOG method to treat informative images. LKDOG algorithm could not help to remove individual specular highlights [3]. Hence, some experiments were conducted by using LK optical flow with the temporal filter to treat individual or small areas of specular highlights that exist in the Azawi Iraqi Journal of Science, 2021, Vol. 62, No. 11, pp: 4148-4157 4154 informative images. Figure 2 shows the resulting enhanced version after using LK optic flow. It is clear from the figure that the distorted informative images are enhanced by removing some of the specular highlights from the original images. To evaluate the performance of the proposed algorithm in this paper, the error values before and after alignment were calculated based on the formula that was mentioned in the evaluation metric. The error value before alignment was 6, while that after alignment was 1.6. Also, the percentage of successful image alignment, which is the ratio between the successfully aligned images and the total images in a video, was measured and was and showed a value of 100%. For the subjective evaluation, it is obvious from Figure 2 (a) that using the current approach in this paper helped to reduce the specular highlights without distorting the original informative colon image. It is clear from Figure 2 (b) that the performance of LK is match better than that of LKDOG in the case of enhancing the informative images. From the results in Figure 2 (a), it can be observed that LKDOG could not enhance the informative images and in some cases, it causes more distortion/ blurriness that makes the image, as appeared in the third image in Figure 2 (a). Hence, using LK was helpful to treat/enhance the informative images. This is attributed to the fact that LK can deal with small motion while LKDOG can estimate large motion between a pair of frames. Small motion causes a small area/individual specular highlight which exists in the informative images, while large motion causes a large area of specular highlight that exists in the noninformative colon images. The blue shapes indicate good effect while the purple shapes indicate damaging effects where sometimes the specular highlights increased. (b) The enhanced version results after applying LK followed by adaptive temporal median/mean (the enhanced proposed approach in this paper). Images on the left are the distorted informative images and images on the right are the enhanced versions. The enhanced new approach applied the LKDOG method in the case that the image is noninformative. The resulted versions which are shown in Figure 3 are for non-informative images (where the specular highlight dominates big parts in the image). It is clear that large parts of the specular highlight in the images were removed.

Figure 3-
The enhanced version results after applying LKDOG followed by temporal median [3].

Azawi
Iraqi Journal of Science, 2021, Vol. 62, No. 11, pp: 4148-4157 4156 In conclusion, applying the LK optic flow method followed by temporal filter helped to improve informative images. On the other hand, applying LKDOG helped in improving noninformative images.

7.
CONCLUSION AND FUTURE WORK Colorectal cancer is a very common cancer in the world. Colonoscopy is one of the common procedures used to detect colorectal cancer and many other abnormalities. Hence, colon images need to be clear and have a lot of details. Because of the wet surface, some areas are occluded and need to be treated to reveal the actual details. The goal in this paper is to remove specular highlight by using an adaptive spatial temporal filter instead of using the temporal median only, in addition to the use of two different optic flow methods to treat the informative and the non-informative images separately. LKDOG helps to remove large areas of specular highlight but it could not help to remove small or individual specular highlights. Thus, I used LK beside the adaptive temporal filter to remove the small or the individual specular highlights. The results show that LK can help to remove individual specular highlights or the small area of the specular that exist in some informative images. Classifying the colon images into the informative and non-informative images and treating them separately helps to discover which algorithm is suitable for dealing with each class. Using LK method beside the adaptive spatial temporal filter instead of using LKDOG with the temporal median in treating the informative images helps to enhance the informative images by removing individual specular in addition to remove some small specular highlight areas. The error alignment of the aligned informative images was reduced from 6 to 1.6 with 100% percent of successfully aligned images. One of the future works will be enhancing the designed algorithm to remove the specular highlights completely from the informative and noninformative images. ACKNOWLEDGMENT The author would like to thank Mustansiriyah University (www.uomustansiriyah.edu.iq) Baghdad-Iraq for its support in the present work. This work has no funding.