Ultrasound Images Registration Based on Optimal Feature Descriptor Using Speeded Up Robust Feature

Image registration plays a significant role in the medical image processing field. This paper proposes a development on the accuracy and performance of the Speeded-Up Robust Surf (SURF) algorithm to create Extended Field of View (EFoV) Ultrasound (US) images through applying different matching measures. These measures include Euclidean distance, cityblock distance, variation, and correlation in the matching stage that was built in the SURF algorithm. The US image registration (fusion) was implemented depending on the control points obtained from the used matching measures. The matched points with higher frequency algorithm were proposed in this work to perform and enhance the EFoV for the US images, since the maximum accurate matching points would have been selected. The resulted fused images of these applied methods were evaluated subjectively and objectively. The objective assessment was conducted by calculating the execution time, peak signal to noise ratio (PSNR), and signal to noise ratio (SNR) of the registered images and the reference image which was fused manually by a physician. The results showed that the cityblock distance has the best result since it has the highest PSNR and SNR in addition to the lowest execution time.


Introduction
Ultrasound (US) medical image is one of the generally applied modalities nowadays because of its many advantages. It involves a mechanical longitudal pressure wave that utilizes a frequency overriding the upper limit of the human hearing [1]. The advantages of US include safety, low cost, non-invasiveness, portability, and real-time operation that rendered it a beneficial tool to show the accurate details of soft tissues in medicine [2].
The dimensions measurement of apparent anatomic lesions or textures through the usage of ultrasound device is a common step in medical treatment. The transducers of sonographer, in contrast with other procedures like magnetic resonance imaging or computed tomography, permit the radiologist to perform scanning in every area and every vision, as a result of their mobility and small size. However, the US images suffer from several types of artifacts, such as shadowing, reverberation, mirror image, poor enhancement, and comet-tail. In addition, a main disadvantage of the sonographer transducers is that they are unsuitable for the documentation of comparatively enormous apparent structures [3]. In this consideration, the anatomic textures that have measurements surpassing those of the sonographer transducer can be authenticated just by sequential images. Due to the fact that the linear transducers field of view (FoV), which is restricted by the ideal 4-6 cm width of probe, is unsuitable for describing these structures in one image, an expansive US FoV is used to show extended anatomic portions of abnormally enlarged organs or massive lesions. Extended field of view (EFoV) US is a technical adjustment of traditional US that supplies images with an extend anatomic FoV with preserving the conventional advantages of traditional US, such as low cost , high spatial resolution, and the absence of ionizing radiation [3,4]. In contrast to the static images of EFoV acquired prior to the 1980s by using scanners of articulated arm, the technique introduced by Weng and colleagues [4] permitted immediate imaging of EFoV without having to practice the external sensors. Traditional US is restricted in the whole glands depiction, such as that of the hyperplastic thyroid gland, since the thyroid has thickness and distance that cannot be contained in one image.
Distinct techniques of image registration can be categorized, such as gradient-based, area-based, and feature-based techniques. Numerous hybrid image registration techniques are also prospective. In image registration using feature-based techniques , correspondence is found, also called as control points, between features such as edges, contours, intersections of line, regions of closed-boundary, and corners, etc., which are extracted in the target image and those extracted in the source (reference) image. A set of feature descriptors, depending on measures of similarity and spatial association, is applied for this objective. Generally, feature-based techniques display image registration comparatively fast, but with absence in robustness of feature extraction and accuracy of feature matching [5].
Feature extraction is an important step in the EFoV image accuracy and it is generally dependent on the feature matching stage. Lukashevich et al. presented an image registration algorithm based on the SURF features using CT scan images [6]. Based on the authors' knowledge, the SURF algorithm has not been applied with EFoV US images. In this paper, several generally used distance measurements, which are cityblock distance, variation, and correlation coefficient measure, will be evaluated. The purpose of this evaluation is to find a suitable similarity measure for EFoV US image registration. Additionally, the contribution of this paper is that when there were wrong and correct matching lines between two adjacent US images, the histogram algorithm was proposed in order to select the most accurate matching points and to have accurate EFoV US images when there are enough correct matching lines. The rest of the paper is organized as follows: section two presents the materials and methods, section three demonstrates the results of the implementation, and section four includes the conclusions.

Materials and Methods
In this section, all the materials, the image acquisition, many similarity measurements, and the methods, which include the SURF algorithm, are described.

Equipment and Phantom
All the necessary trial processing used for the images of ultrasound are implemented in a PC with the descriptions of: core i7, 6700 HQ CPU, 2.60GHz and Matlab9.5.0.944444 (R2018b). The ultrasound device that was applied in capturing the experimentally wanted US images was a GE Logic Book XP Portable Ultrasound Machine, B mode. The linear transducer frequency that was applied during the imaging procedure was 7.5 MHz. In order to apply the imaging procedure and acquire the ultrasound images of the thyroid gland, phantom neck (CIRS, model 074) with lesion condition was utilized. This phantom has a somewhat expanded thyroid gland placed inside an incarnate neck. The phantom offers the trachea, inner jugular vein and mutual carotid artery as interior anatomical markers. The US images that were used in this work belong to a PhD research [7].

Speeded-Up Robust Feature Algorithm
In 2006, Herbert Bay proposed an efficient algorithm of interest points detector and descriptor defined as the Speeded Up Robust Feature (SURF) that is invariant to the scaling, translation, and rotation [8,9]. It has a high accuracy as an advantage, as in a Scale-Invariance Feature (SIFT) algorithm, while it has a smaller computation time and several times faster than SIFT [10].
The SURF algorithm involves three parts, which include detecting, describing, and matching of interest points, as shown below, in addition to the pre-processing stage which is the integral image.

Integral image
Integral image is a technique of a feature representation for the original image. It introduces an impact on reducing the complexity of box filter convolution computations and raising speed. As shown in Figure-1, the shadowed region ( (X)) which is bounded by vertices A, B, C and D acts as the total sum of all pixels in the original image at a location X=(x, y) within a rectangular region formed by the origin (O) and a location X [9]: (1) where (X) represents the integral image and represents the original image.

Detection of interest points
In general, image pyramid commonly performs the implementation of the scale spaces [11]. The weighting box filters in SURF are utilized and represented as the second order Gaussian partial derivative approximation in x-direction, y-direction, and xy-direction. As shown in Figure-2, the white lobes represent the negative coefficients, black lobes the positive coefficients, and grey lobs the zero values [12]. The SURF detector is depending on the theory of multi-scale space and the detection of features is based on Hessian blob detector matrix that has an advantage of good accuracy and efficiency [13]. Given an image I with a point X = (x, y), the Hessian matrix at scale in the space X is expressed in following way [14]: (2) where: (3) The operator (*) is the convolution between the Gaussian kernel and the image I, and (4) L xx represents the convolution operation of the Gaussian second order derivatives with image I in point X in x-direction. L yy represents the convolution operation of the Gaussian second order derivatives with image I in point X in y-direction. L xy represents the convolution operation of the Gaussian second order derivatives with image I in point X in xy-direction.
Then, the Hessian determinant for each pixel of an image will be calculated that detects the interest points.

Description of interest points
Descriptors are the features (intensity) distribution within the interest point neighborhoods. The extraction of the SURF descriptor vector is represented in two stages [11]:  A dominant orientation is founded with regard to the circular region around the interest point.  Along the dominant orientation, a square region is constructed for obtaining the descriptive information. The square regions are divided into sub-regions and the responses (denoted as , and ) of the Haar wavelet are first weighted with a Gaussian scale ( s) (s is a constant which refers to the scale of the detected feature point) in both horizontal and vertical directions centered at the interest point.
In order to achieve the rotation invariant as mentioned previously, , and are then summed up ( ). Then, a vector which is a four-dimension (4D) will be generated in each sub-region after calculating the eigenvector normalization [12]: (5) After that, the eigenvectors of 16 sub-regions are calculated which leads to the building of eigenvector which is a 64D (16 ) SURF descriptor.

Image matching based on distance similarity measure
The Euclidean distance method was used in the step of the matching between the descriptor vectors in the SURF algorithm in order to find the best matches between the images [9].

Variance measure
For any variable vector which consists of scalar observations, the variance is expressed as [15]: where: : The mean of vector (A),

Euclidean Distance
The Euclidean distance measure is utilized to describe the distance between two feature vectors [16], as follows: (8) where: a: Variable vector. b: Variable vector. n: Vector length.

Cityblock distance (Manhattan distance)
The cityblock distance measure is used to describe the distance between two feature vectors, as follows [16]: (9) Where: a: Variable vector. b: Variable vector. n: Vector length.

Correlation coefficient measure
The correlation coefficient of two distinct vectors is an extent of their linear dependence, and usually ranges between -1and 1. If each vector x and y is with N scalar observations, the coefficient of correlation measure is expressed as in equation (10) where: (11) E: is the average value.
, and : The mean values of vectors x and y, respectively: is the standard deviation of vectors x and y, respectively:

The Proposed Registration Algorithm
Before performing the evaluations objectively, the registration process must be first implemented. Because of the advantages and drawbacks of US images, as illustrated previously in section one, we expected to have correct and wrong matching lines between the interest points that can be resulted from the SURF algorithm. The authors' hypothesis is based on that the following: 1.
If all the matching points (the wrong and correct ones) are used, the EFOV would not be medically accepted.

2.
Based on the preliminary experiments, most of the correct matching lines would be close to each other, so they would have the highest location frequency.

3.
Using the histogram technique to specify the location of the most corrected matching lines would result in accepted EFoV US images from the medical point of view.
The histogram has been used in the registration process, either in the segmentation process to perform the registration [18] or as a histogram matching in which the specified histogram is uniformly distributed or calculate the histogram of the images and find the similar histograms [19,20] which can be performed using one of the similarity metrics. However, all these applications of the histogram in the registration process are different from our proposed algorithm; the histogram in our proposed algorithm would be used to specify the location of the most corrected matching lines between the control points that have been resulted from the SURF algorithm using any two adjacent or pair US images. Based on the above, the proposed methodology was applied as illustrated below in order to have accurate an EFoV US image when there are enough correct matching lines, and even if there are some errors in the matching lines . 1. Calculate the histogram of the matching point in the x-direction and select the highest frequency in each US input image, as shown in Figures-3 and 4). Figure-5 shows the results of the histogram calculation. 2. Cut image 1 in x-direction from the first column to the value that is obtained in step1, as shown in Figure-

6.
Calculate PSNR and SNR for the registered image for each method.

The Proposed Evaluation Methods
In order to perform the evaluation between the applied similarity measurements, the resulted images were evaluated subjectively and objectively. Regarding the subjective evaluation, different similarity measurements were applied in this work and the determination of the number of correct and wrong matching lines was achieved visually. For the objective evaluation, the registration process was implemented first, then the PSNR and SNR were calculated.

Subjective Evaluation Method
In order to subjectively evaluate the EFoV US images of the different matching methods (mentioned above), we compared visually the registered images and the reference image and calculated the corrected matching pair points.

Objective Evaluation Method
SNR and PSNR were used for the objective evaluation and the comparison between the registered image at each applied matching method (mentioned above) with the reference image which was registered manually by a physician. Additionally, the execution time was calculated.

Signal to Noise Ratio
SNR is utilized in imaging to describe the quality of image. The imaging system (digital or film) sensitivity is typically characterized in the expressions of the signal level that introduces a threshold level of SNR. Traditionally, the SNR has been known as the proportion of the average amount of the signal ( ) to the noise standard deviation ( [21] : with, and,

Peak Signal to Noise Ratio
PSSNR is the ratio of the maximum pixel intensity to the power of the distortion, like mean square error (MSE), whereas PSNR acts a peak error measure [22]: PSNR= (19) where R is the highest intensity value in the input image information type. Let us say, if the image with an 8-bit unsigned integer data type, R is 255, MSE is the mean square error between the registered image and reference image , as follows: (20) where M and N are the number of rows and columns in the input image, respectively.

Implementation and Results
The registration process can be performed manually by a physician or automatically. The manual registration can be performed by first selecting the control points manually (not less than two matching points in each image), then the registration can be performed. In the automatic registration, the accuracy of control points has the impact effect on the accuracy of the registration result. Accordingly, this research discusses the best matching method that can increase the matching or control points' accuracy. These methods are the Euclidean distance, cityblock distance, variation, and correlation.
In order to compare between the applied matching methods, the resulted images were evaluated subjectively and objectively. The objective assessment was achieved by calculating the PSNR and SNR of the registered images and the reference image, as in Figure-8, which was fused manually by an expert. The registered images were achieved by using the control points resulted from the mentioned above matching methods. The subjective evaluations for each applied method are illustrated in Table-    Regarding the objective evaluations, the registration process must be performed firstly. To perform the registration process, all matching points would be used taking into consideration that there are correct and error matching points. Since the cityblock method showed the best result subjectively, we used its results to perform the registration. Figure-13 shows the final result of using the correct and wrong matching points based on the cityblock results.  The EFoV US image in Figure-13 is incorrect and it cannot be accepted. In order to obtain correct and accepted EFoV US image, despite a number of wrong points, the registration algorithm based on histogram (illustrated the in previous section) was proposed and applied. Through studying Figures-14 to 17 below, it was observed that the number of correct lines was mostly grouped in regions close one to the other, which indicates that they would have the highest frequency. Thus, we expected to have accurate EFoV US images when there are enough correct matching lines, and even if there are some of wrong matching lines. Figures-(14, 15, 16, and 17) show the results of the registered images for each method. Table-2 illustrates the PSNR and SNR results of each method. According to Table-2, the cityblock distance has the best result since it has the highest PSNR and SNR in addition to the lowest execution time. The Euclidean distance shows better results than those of the other methods. The variation method shows better results than the correlation method in terms of PSNR and SNR values as well the execution time, since the latter has the lowest PSNR, SNR values and highest execution time.  The same procedure was applied with another two US images which have a comet-tail artefact. The results were then evaluated subjectively and objectively compared with the manually registered image (reference image), as shown below in Figure-18. Table-3 shows the results of subjective evaluations for each applied method for Figures-(19, 20, 21, and 22).

Conclusions
The ultrasound method is widely used in medical imaging recently, but the US images suffer from several types of artifacts such as shadowing, reverberation, mirror image, poor enhancement, and comet-tail. In addition, the sonographer transducer can be authenticated just by sequential images. The linear transducers FoV is restricted by the ideal 4-6 cm width of probe. In this paper, different similarity measurements were generally used, illustrated, and evaluated in EFoV US images. A registration process based on histogram was proposed in order to generate accurate US EFoV images using the matching points resulted from the SURF algorithm application. This paper compared the Euclidean distance, cityblock distance, variation, and correlation methods in the matching stage within the SURF algorithm and discussed their effects on the registration process for the US images. In addition, matching positions with the higher frequency using histogram were applied in order to specify the location of the most corrected matching lines and then achieve accurate EFoV US images, even if there are some errors in the matching lines. Regarding the subjective evaluation for the adjacent pair US image and the objective evaluation for the registered US images, the results showed