Automatic Pectoral Muscles Detection and Removal in Mammogram Images

The main aim of the Computer-Aided Detection/Diagnosis system is to assist the radiologists in examining the digital mammograms. Digital mammogram is the most popular screening technique for early detection of breast cancer. One of the problems in breast mammogram analysis is the presence of pectoral muscles region with high intensity in the upper right or left side of most Media-Lateral Oblique views of mammogram images. Therefore, it is important to remove this pectoral muscle from the image for accurate diagnosis results. The proposed method consists of three main steps. In the first step, noise is reduced using Median filtering. In the second step, artifacts removal and breast region extraction are performed using Otsu method. Finally, the pectoral muscle is extracted and removed using the proposed Split Orientation Local Thresholding (SOLTH) algorithm. For this study, a total of 110 mammogram images from the Mini-Mias database (MIAS) were used to evaluate the proposed method’s performance. The experimental results of automatic pectoral muscle detection and removal were observed by radiologist and showed 90.9% accuracy of acceptable results.


Introduction
Breast cancer is the second leading cause of cancer related death in women. However, early detection and diagnostics of breast cancer is substantially increasing the chance of survival [1,2]. Mammography is a commonly used screening method and the most effective technique for early detection of breast cancer that helps radiologists and doctors, not only to diagnose breast cancer but also to follow-up patients with breast cancer. A mammogram is a safe and reasonably accurate x-ray image of a breast. For women at risk of breast cancer, mammogram screening should be done routinely starting at the age of forty to screen for any early signs of the disease [3,4].
The main aim of the Computer-Aided Detection/Diagnosis (CAD) system is to assist the radiologists in examining the digital mammograms. Therefore, breast lesion detection and segmentation automatically from the mammogram image needs several stages that have to be performed. The pre-processing stage is a very important process for enhancing image quality that includes several steps. The first step is removing the undesired regions in the background of the mammogram image. The undesired regions mostly represent two types of noise that appear in the mammogram image; the first type is called the High-Intensity noise which represents the labels that are shaped as rectangular. The second type is called the Low-Intensity noise which is represented as marks [5]. The pectoral muscle is the other part that needs to be removed from the breast region. It appears as a dense triangular region (i.e. bright pixels) in the top corner of the right or left side of the Media-Lateral Oblique (MLO) in the mammogram image [6]. The intensity of the region of the pectoral muscle is similar to that of the breast lesions because the pectoral muscle texture is identical to some abnormalities detected in the mammogram. In certain cases, the pectoral muscle intensity is higher than that of the breast lesion region. If the pectoral muscle appears in the view of the MLO, the false positivity will be increased in the CAD detection of the breast cancer in the mammogram image. For this reason, the removal of the pectoral muscle from the mammogram image is necessary before any processing [5].
The main aim of the proposed method is to automatically remove the pectoral muscle from the breast region in the mammogram image. A median filter is used to enhance image quality, then the Otsu method is applied to create a binary mask for removing s (i.e. any artificial product that appears) from the image. The SOLTH algorithm is proposed to detect and remove the pectoral muscle from the image, depending on the split, orientation, and local threshold.

Related Work
In the literature investigated, several approaches have been proposed for detecting and removing the pectoral muscle. Sreedevi and Sherly [5] proposed an approach for segmenting and removing pectoral muscles. It combines global thresholding, canny edge detection, and connected component labelling. In this approach, there are three limitations. First, the normalization process for mammogram images into a range that is more familiar to the senses 16 is performed; this means that the technique is needed to change the values of pixel intensity. Second, this approach is used to detect the pectoral muscle only in the right MLO mammograms. The left MLO view mammogram is handled by flipping to make it appear as a right MLO mammogram. Third, the proposed approach does not restore the original pixel intensity of the breast region, rather, this is achieved by matching the binary image to the original grayscale image. The previous results showed that the proposed approach gives an overall accuracy of 90.06%.
Boss et al. [7] proposed a method of the pectoral muscle removing by a histogram based eight neighborhood connected component labelling method for breast region extraction and removal of pectoral muscle. There are two limitations in this algorithm to detect and remove the pectoral muscle. First, the normalization was used for breast region by changing all pixels intensity. Second, the right MLO view mammogram is turned to make it appear as a left MLO mammogram. The accuracy of the results obtained over the MIAS database by using the proposed algorithm was 89.5%. Liu et al. [8] proposed a novel approach depending on a statistical theory; the goodness of fit of the statistical theory is utilized for developing a measurement of local spatial distribution in the mammogram image. The proposed method is implemented, pixel by pixel, in the mammogram image, where the pectoral muscle and the background are separated by detecting their contours on the resulted image. In this method, there is an issue that the contour of the breast with the pectoral muscle was better identified than the contour of the pectoral muscle only. Due to the detected pectoral muscles contour, which overlaps with the breast tissues in most images, no clear borders can be observed in these regions. The accuracy of the results obtained over MIAS database by using the proposed algorithm was 81%.
Makandar and Halalli [9] proposed an algorithm which removes the undesired background and the pectoral muscle by the use of threshold technique and modified region growing technique, respectively. The proposed algorithm was tested on mini-MIAS database, where the Region of Interest (ROI) was extracted from all the images accurately, and proved to be suitable for CAD system of early detection of breast cancer. The experimental results showed that 100 images were selected from the MIAS database. 97 images were properly or over segmented and 3 images were under segmented. As a result, the accuracy was 97 %.
Santle et al. [10] proposed an automatic method that used the watershed transformation for identifying the pectoral muscle that appears in the view of the MLO mammograms. The watershed transformation of the mammogram shows interesting properties that include the appearance of a unique watershed line corresponding to the pectoral muscle edge. In this method, there is an issue in the design made to identify the pectoral muscle in the left MLO mammograms. However, the right MLO mammogram was processed by rotating to make it appear like a left MLO mammogram. If the pectoral muscle was identified, its original position was restarted by rotating it again. This avoided the situation of developing two separate approaches for the handling of left and right MLO mammograms. The validation accuracy of the results obtained by using the proposed method, over 84 mammogram database, was 85%.
Mirzaalian H. et al. [11] proposed a method of presenting a new pectoral muscle identification in the MLO of mammograms image based on the algorithm of non-linear diffusion. They tested over 90 mammogram images from the mini-MIAS database. The limitation of this approach is to determine the pectoral muscle of the left MLO mammograms. This region has a higher density as compared to that of the other regions. The extraction of this region is achieved by using thresholding. The correct threshold is chosen as the average intensity of the up-left pixels in the image resulting from the iteration non-linear diffusion.

Materials and methods
The proposed method was implemented using the programming language (C#), working under Visual studio 2010. It includes three major steps, as shown in Figure -1. The first step is used for removing the noise from the mammogram image by using the median filter. In the second step, the Otsu method is applied to remove the artifacts and annotations found in the background of the mammogram image. Finally, the SOLTH algorithm (Splitting, Orienting, and Local Threshold) is proposed for detecting and removing the pectoral muscle.

Mammogram Image Database
The dataset of mammogram images used in this paper are taken from the Mammography Image Analysis Society (MIAS), a UK research organization related to Breast Cancer Research and freely accessible for scientific purposes [12]. The images of the database were created from a film-screen mammographic imaging method in the United Kingdom National Breast Screening Program (NBSP) [9] and consists of 322 MLO view mammograms (right and left view). The type of images is grayscale with a size of 1024 ×1024, 8 bits per pixel, and artifacts noise. Table -1 shows the description of the mammogram images in the MIAS dataset, which are classified into abnormal (116: 64 as benign, 52 as malignant) and normal (206). The dataset is also split into three types of the background tissue, based on the intensity of the mammogram (i.e., Fat, Fatty-glandular and Dense-glandular). There are two types of the severity of the abnormality (B -Benign and M -Malignant) [12] . The main components in the mammography image are shown in Figure -2 Figure 2 -An example of the main components of the MIAS dataset image (mdb058) [12].

Mammogram pre-processing
Pre-processing for a mammogram image is a procedure that helps improve image quality to allow the lesions detection without missing important data. The main aim of pre-processing is to improve the quality of the image (i.e. mammogram image contains many varieties of noise) to be prepared for additional processing by removing or minimizing unnecessary or useless components of a mammogram. Two types of noise were observed in the images: (a) high-intensity noise regions, including bright rectangular labels, and (b) low-intensity labels. It is therefore necessary to use preprocessing techniques before removing the pectoral muscle [13]. In this paper, the following preprocessing phases are used.

Median filtering
The median filter is a non-linear spatial filter. It is classified within the filter in the order-statistical filter category. Because of its capacity to deliver noise decrease with less blurring, it is more powerful than traditional linear smoothing filters and maintains the sharp edges. The median filters are also efficient, with possibility to change their size without the need for conversion [14]. The selected median filter value will be exactly equal to one of the existing brightness values, so that there is no round-off error when working independently with full brightness values, as compared to other filters [15,16]. It can be used on mammograms to lower the quantity of noise and preserve corners. This is achieved by selecting an image area (3 x 3, 5 x 5, 7 x 7, etc.), considering all pixel values in that area, and placing them in an array called an object array. The output image array is a collection of all the median values of the object array acquired for all pixels [17]. The median filter runs into a sequence of loops that cover the full image array.

Artifacts Removal
Artifacts are fractions of mammogram images that are of two kinds; high-intensity artifacts such as the shaped rectangular labels and low-intensity artifacts such as the marks [7]. The majority of mammography images in the dataset used in our methodology contains that type of noise represented by artifacts. They are undesirably present in mammography images and may negatively affect the results of the removal of the pectoral muscle, due to their high intensity. Therefore, it is better to remove them from mammogram images before applying pectoral muscle removal. The process of artifacts removal from mammogram images comprises three main steps, as follows: i. Binarization of the mammogram image by using Otsu method, which is an easy and efficient technique based on a global threshold value that is automatically selected for each image entered to convert the grayscale image to a binary image. Therefore, the pixels are differentiated in an image by two classes; foreground and background [18] . The foreground represents the white part (1's) and includes the breast region with the artifacts, whereas the black part (0's) of the image is the mammogram background, as shown in Figure -3b. ii. Calculation of the size of the largest object within the mammogram image, which represents the breast portion with the pectoral muscle, by determining the length and width of the object to be retained inside the image and removing all objects that are less than the specified length and width. The required width of the object is half the width of the image and the height is 150 minus the height of the image, according to the following equations: Width of the object = 1/2 width of the image (1) Height of the object = Image Height -150 (2) These values were determined experimentally and proven to be effective with the majority of images. In other words, any object in the image that does not comply with the specified measurements will be deleted and the object whose measurements correspond to the specified measurements will be retained, as shown in Figure -3c. iii. Returning the original pixel intensity to the image, as achieved by matching the binary image with the original grayscale image. Thus, the pixel that represents the black colour in the original image will remain black while the pixel that represents the white colour in the binary image will be restored from the original image according to the location of the pixel in the image. The final output of this process is an image that includes a breast region with a pectoral muscle and no noise artifacts, as shown in Figure -3d.

Pectoral Muscle Detection and Removal
The pectoral muscle is a region of homogeneous intensity that is located in the upper left or right portion of the breast in most of MLO mammography images. The appearance of the pectoral muscle in the MLO view in the mammography images is important and is a proof of the reliability of the patient's complete breast imaging for diagnostic purposes by the specialist. On the other hand, the detection of the region of the lesions by segmentation of the mammogram image could have an erroneous-positive effect due to its high intensity relative to the intensity of the area of the lesions, which are more or less intense and, in some cases, equal to the intensity of the pectoral muscle. Thus, it is a very important to detect and remove the pectoral muscle automatically from the original mammogram. This process was achieved by using the proposed algorithm (SOLTH) that consists of the following steps:  The next steps will be performed only with ULP and URP, which represent the location of the pectoral muscle in the mammography image, as shown in Figure -4. Step2: Finding the orientation of the pectoral muscle in the upper parts (ULP and URP), which represent the position of the pectoral muscle in the splitting process described above. This is performed by calculating the number of non-black pixel intensities in each of the two parts, then comparing them. The part that has the highest number of non-black pixel represents the orientation of the pectoral muscle in an image (ULP or URP), as shown in Figure  Step 3: Using the local threshold (thr), where the threshold value was greater than or equal to 150. This thr value was carefully selected through experiments and proved to be successful with most images, by examining each pixel that achieves the threshold value condition and converts it to a black intensity. The pectoral muscle will be removed from the ULP in this process, as shown in Figure -6.
(a) (b) Figure 6-Results of removing the pectoral muscle (a) (ULP) (b) after removing the pectoral muscle.
Step 4: To view a pectoral muscle in a binary format, we mapped a pixel with a value of 255 in a new empty image at the same time as the pixel intensity was modified to black color (value of 0) in the previous step. In the same way above, the pectoral muscle was shown but with its original intensity, as presented in Figure - Step 5: The last step is applied to restore the four parts (ULP, URP, LLP and LRP) of the image that represents the breast region after the removal of the pectoral muscle, as shown in Figure -

Experimental Results
The proposed algorithm was experimented on 110 mammogram images of the mini-MIAS database. The format of these images is "BMP" with a size of 1024 ×1024 and 8 bits per pixel. This work was performed using c# program. Images of most types of tissues, such as fatty, fatty-glandular, and dense-glandular were included. A method for extracting the ROI and removing the pectoral muscle automatically was proposed. The results were examined by a radiologist and the accuracy obtained was 90.9 %. Out of 110 images, 100 images had the Accepted category (no remaining pectoral muscle tissue in the breast part), 10 images had the Unacceptable category (Remaining pectoral muscle tissue in the breast part), and 6 images were removed (absence of pectoral muscle in the MLO view due to radiographic errors). Figure -9 shows a sample of the results of the preprocessing using Median filtering, then artifacts removal using the Otsu method. Figure -10 shows the results of the Pectoral Muscle Removal using SOLTH algorithm.  Figure 10 -Results of pectoral muscle region detection and removal.

Conclusions
Digital mammography is a technique used for early detection of breast cancer, which is one of the leading causes of death for women. Pectoral muscle tissue appears in mammography in pixels intensity similar to that of cancer cells. Therefore, it must be detected and removed from the breast region of the mammography before any processing is performed to detect the cancer cells in the breast region. The main objective of this paper is to propose an efficient algorithm of pectoral muscle removal on MLO mammograms. The proposed algorithm is based on the splitting of the mammography image into four parts; these are the ULP, URP, LLP, and LRP. Then, the orientation of the pectoral muscle is found in the upper parts (ULP and URP). Then, the local threshold (thr) was used to remove the pectoral muscle. Our algorithm was tested on 110 mammogram images taken from the MIAS database and achieved an accuracy of 90.9%. This algorithm can be effectively used to detect and remove the pectoral muscle without losing any information from the remainder of the mammogram. After removal the pectoral muscle from the mammogram, further processing is confined to the breast region alone.