Detection and Classification of The Osteoarthritis in Knee Joint Using Transfer Learning with Convolutional Neural Networks (CNNs)

Osteoarthritis (OA) is a disease of human joints, especially the knee joint, due to significant weight of the body. This disease leads to rupture and degeneration of parts of the cartilage in the knee joint, which causes severe pain. Diagnosis of this disease can be obtained through X-ray. Deep learning has become a popular solution to medical issues due to its fast progress in recent years. This research aims to design and build a classification system to minimize the burden on doctors and help radiologists to assess the severity of the pain, enable them to make an optimal diagnosis and describe the correct treatment. Deep learning-based approaches, such as Convolution Neural Networks (CNNs), have been used to detect knee OA using transfer learning with fine-tuning. This paper proposed three versions of pre-trained networks (VGG16, VGG19


Introduction
Osteoarthritis (OA) is the most prevalent musculoskeletal condition, characterized by joint inflammation and significant structural abnormalities [1,2]. The most frequent kind of "Arthritis & Joint Disease" is osteoarthritis. It is one of the most typical causes of illness in elderly and overweight people. It is the leading cause of adult disability. This condition primarily affects adults over 45, with women suffering more than males. The cartilage has been destroyed, forcing the bones to grind against each other, producing extreme pain and inflammation. This thickens and forms spurs along the edges [3].
Imaging has become more important in the treatment of OA over the last decade. This method may be used to diagnose, monitor, and measure the amount of all joint deterioration caused by OA. X-ray is one of the techniques utilized in OA imaging that is cheap, and it has been the golden standard for OA evaluation of patients. on X-ray, the primary pathological symptoms of knee OA that can be seen are Joint Space Narrowing (JSN) and the growth of osteophytes [4,5].
The Kellgren and Lawrence (KL) scoring system is the most widely used knee OA severity scoring system, which has been adopted by the World Health Organization (WHO) in 1961 [6]. The KL system divides knee OA illness into five grades, ranging from 0 to 4 (Normal, Doubtful, Mild, Moderate, Severe) [7,8]. Table 1 shows different grades of OA disease [9]. These 5 grades are shown in Figure 1 [6].

Grade 0
Normal, no signs of osteoarthritis on the radiographs.

Grade 1
Doubtful, joint space narrowing and probable osteophytes Lipping are both possibilities.

Grade 2
Mild, noticeable osteophytes with probable joint space narrowing Grade 3 Moderate, many osteophytes, a noticeable narrowing of joint space, tiny pseudo cystic regions with sclerotic walls, and a probable deformation of bone shape Grade 4 Severe, big osteophytes, a significant restriction of joint space, extreme sclerosis, and clear abnormalities of bone shape Deep learning is a novel machine learning neural network technology that has recently been developed to improve classification task performance. Convolutional Neural Networks (CNNs) is one type of deep-learning approach. It succeeded in images classification and identification tasks, and it can even mimic human vision [10]. Deep learning techniques like CNN have recently obtained state-of-the-art results in various image classification problems. Using CNN-based algorithms to solve the KL-scoring classification problem provided encouraging results [11].
Deep learning has the advantage of being able to recognize the feature extraction process without the use of a separate technique [12]. This study uses CNN with transfer learning to train the dataset utilizing significant pre-trained networks such as VGG16, VGG19, and ResNet50. These networks are intended to train some of the most well-known and large datasets, such as ImageNet(over 50 million high-resolution classified images from over 22,000 classes made up this data set.). Employing the above-mentioned models with pretrained weights for ImageNet in addition to having a similar foundation for comparison [13]. Data augmentation and dropout are techniques that are used to reduce the overfitting problem, it is an issue that occurs when a model is learned and performs so well on learning data and works so bad on testing data [14].

Related works
Many related works to detect and classify the OA in knee joint have been reported. For detection and classification problems, deep learning and machine learning techniques have been used. Most of them are described: In 2018 Abdelbasset Brahima et al. used a circular Fourier filter to pre-process the X-ray in the Fourier domain. The data is then subjected to a unique normalization approach based on modeling that predicts multivariate linear regression (MLR) to decrease the variations in OA and healthy persons. To reduce dimensionality, an independent component analysis (ICA) technique is applied at the feature selection/extraction step. Finally, for the classification challenge, Naive Bayes and random forest classifiers are utilized. The findings demonstrate that the suggested approach has an 82.98 % predictive classification performance for OA detection [15].
In 2019 Pingjun Chen et al. employed 2 deep CNN to predict the severity of knee OA using the Kellgren-Lawrence rating scale. To begin, they used a customized one-stage YOLOv2 network to recognize knee joints based on dimensions of the Knee joint scattered in X-ray. Second, they used a new adjustable ordinal loss to fine-tune the common prevalent CNN models, including Res-Net and VGG and Dense-Net versions in addition to InceptionV3 to evaluate the identified knee joint images. The greatest classification accuracy of 69.7% is achieved by the fine-tuned VGG19 model with the recommended ordinal loss [6]. In 2019 Rima Tri Wahyuningrum et al. employed pre-processing input images and feature extraction using a CNN and using LSTM (Long Short Term Memory) for classification. A manually cropping region on the knee joint with 400 × 100 pixels is used for pre-processing. They use VGG-16 for feature extraction. These characteristics (features) are then employed as the LSTM's input signal. Finally, the severity of knee OA is classified using the LSTM model. This strategy yields a 75.28 % success [4].
In 2020 Bofei Zhang et al. used Res-Net (Residual Neural Network) to recognize the joint of the knee from X-ray and then combined Res-Net with CBAM (Convolutional Block Attention Module) to generate an automatic evaluation of the KL score. A multiclass accuracy is 74.81 % was reached by the suggested model [11].
In 2020 Kamali C et al. employed the U-net model for cartilage segmentation and a few deep learning techniques such as SVM and KNN for OA severity classification. They used KL-grading to train the algorithms to assess the severity of knee osteoarthritis. The SVM classifier produces a more accurate classification result than the KNN classifier, with an accuracy of 73% vs 70.5% for KNN [16].
In 2021 Albert Swiecicki et al. created a method for automated deep learning that assesses knee osteoarthritis severity according to the KL grading system by combining the Lateral (LAT) and Posterior-Anterior (PA) views of knee X-ray. For the assessment of OA in the knee, an unique deep learning-based technique was used in two steps : (1) detection of joints of the knee in X-ray using faster R-CNN. and (2) classification using multi-input CNN for the Two inputs image (LAT, PA). The result of the model is 71.90 % multi-class accuracy [17]. In 2021 Yifan Wang et al. described a deep learning-based highly automated technique for diagnosing osteoarthritis. Transfer learning from the object detection domain was effectively applied to the segmentation of the knee joint region. They used Yolo for object detection to extract ROI knee, then used ResNet50 CNN backbone to extract feature maps from cropped knee X-ray, the extracted feature maps were flattened and recomposed as a series, and they used a visual transformer to exploit correlations between different local regions for the final classification. The accuracy of the proposed method's outcome is 69.18 % [18].

Problem Statement
The main problem in this article are identified as follows:  The traditional techniques of using transfer learning that is represented by using (VGG16, VGG19, and ResNet50) versions, it's not necessarily lead to optimal results, especially in medical image classification tasks. So, computer-aided diagnosis is a system that was developed to minimize the burden on doctors and helps radiologists to assess the severity of the pain from X-ray.


Classifying knee joint X-rays according to their classes (normal, doubtful, mild, moderate, severe) with high sensitivity and more accurate results.


Handle with a small training dataset. This paper is structured as follows: Section 2 presents the materials and methods, Section 3 shows results and discussion, Section 4 contains the conclusion.

Materials And Methods 2.1 Datasets:
In this study, 1650 X-ray for the knee joint have been used from (Mendeley data platform) with DICOM (Digital Imaging and Communications in Medicine) being the accepted standard for the management and communication of medical imaging and data information. Two medical specialists manually label each X-ray knee according to Kellgren and Lawrence's grades. The two specialists are highly skilled orthopedic surgeons who review between 70 and 100 radiographic every day [19].

Convolutional Neural Network (CNN)
Convolutional neural networks (CNNs) are a type of deep learning technology for working with images that can eliminate the need for handmade feature extractors [20]. CNNs have been employed for various image classification applications, with CNN architectures for medical image analysis being developed in recent studies [21].
Convolutional neural networks (CNNs) are artificial neural networks (ANNs) that use the convolutional operation in at least one layer. In 1990, Yann Le Cunn developed the first CNN, despite its little popularity at the time.
A general CNN architecture looks like the one shown in Figure 2 and consists of distinct types of layers. The process of building a Convolutional Neural Network always involves 3 significant layers. 1: Convolution 2: Pooling 3: Fully connected [22].

Convolutional layer
The basic layer for extracting information from an input image is called convolution layer [23]. This layer has a series of filters, and its purpose is to produce feature maps by performing a convolution operation between these filters and the input layer (image) [17,18].

Pooling layer
The pooling layer minimizes the representation collected by the convolutional layer's spatial dimension. [26]. Max pooling is the most prevalent type of pooling. The max-pooling layer moves a window across its input image and gets the window's maximum value while ignoring all other values [27].

Fully connected layer
After a series of convolution and pooling layers, the image's feature map is split, and all of the neurons in the element map are converted into a fully linked network [28]. Finally, Softmax, the output layer (final layer), is the classification layer. It's a duty of assessing whether or not an image belongs to a specified class [29].

Transfer learning (TL)
In the same manner that people employ their prior knowledge to understand and accomplish new problems, neural networks are trained and tested on various datasets. The network's learned knowledge may then be used to train and test other datasets. This technique is called Transfer Learning [30].
The neural network uses previously acquired information to handle new problems in transfer learning. Weights and features are used in neural networks to represent previously acquired information. These networks remember their previous weights and characteristics and perform great performance on the target job. VGG16, VGG19, and ResNet50 variations are transfer learning models accessible on the Keras library and are the transfer learning models employed in this work [31]. This paper used Three pre-trained networks: VGG16, VGG19, and ResNet50.

VGG
(Visual Geometry Group) is one of the popular CNN models previously trained on a large dataset from ImageNet. K. Simonyan and A. Zisserman created this model in 2014 [25,26]. The input image to the VGG model is a 224 X 224 RGB image that is fixed in size. The mean RGB value generated using the training set is subtracted from each pixel in this network's preprocessing. In different VGG variations, the number of convolutional layers employed varies. There are 13 convolutional layers, 5 max-pooling layers, and 3 fully linked layers in the VGG-16 variation. Each of the first two completely linked layers has 4096 nodes, while the last fully connected layer contains 1000 nodes for the output layer. RELU activation function used in all convolution layers [27,28].
VGG-16 and VGG-19 are the most frequent VGG models, consisting of 16 and 19 layers, respectively. VGG-19 differs from VGG-16 in that it contains an additional layer in each of the 3 convolutional blocks [32]. Figure 3 shows the architecture of VGG16 and VGG19 [14].

ResNet
A deep residual neural network, also known as ResNet, performs well with very deep designs and provides more direct way for information to flow across the network.
Shortcut or skip connections run parallel to the regular convolutional layers in a residual neural network, enabling it to recognize global features. After some weight layers, the shortcut connection is created to add the input x to the output. These shortcut links enable the network to avoid layers that aren't useful when the dataset is training, resulting in the number of layers being optimally tuned for rapid training. Figure 4 shows a single ResNet neural [35].

Proposed method
This section presents a proposed method for the automatic classification of OA knee joint. This task is accomplished by using the VGG16, VGG19, and ResNet50 CNN models with fine-tuning for training the dataset. This work is divided into three stages: preparing the dataset, pre-processing data, and training and classification, as shown in Figure 5. After taking X-ray images from the X-ray machine first, Resizing the input images to 300 widths and 224 heights and divided the dataset to 90% for training and 10% for validation. Data augmentation has been used to create many copies from a single image and give the training model more strength and prevent a model from overfitting. Data augmentation such as (rotation range=10, width shifting=0.1, height shifting=0.1, zooming=0.2, horizontal flipping=True, and vertical flipping=True) have been utilized in our experiment.
The proposed method utilizes VGG16, VGG19, and ResNet50 CNNs by freezing convolution and pooling layers as feature extraction and replacing the fully connected layers (FC) of the (VGG16, VGG19, and ResNet50) with an flatten layer and 2 fully connected layers that use the Relu activation function and an output layer that uses a soft-max activation function for automating the process of illness diagnosis into 5 classes. We employed transfer learning to add the previously acquired model's information into our study. Figure 6 shows the network architecture used as a proposed method.

Figure 6: Architecture of the proposed method
We trained the 2 FC layers with 200 epochs. The 2 FC layers contain 512 neurons and 256 neurons, respectively. We utilized a 0.2 drop out to reduce overfitting. The dropout minimizes computation in the training process by dropping out a few neurons or setting them to zero. The proposed network used a 32 Batch size and was trained to utilize Adam optimizer, it is an algorithm for optimization of gradient descent for training deep learning models, with a 1e-4 learning rate. Loss function (categorical_crossentropy) has been used to determine a loss of the model. Two call-backs functions have been used from Keras Early Stopping and Checkpoint. Early stopping helps you to end the training session early. This function allows you to provide the performance measure you want to monitor, in addition to the trigger, and when it is triggered, it will terminate the training process. The checkpoint function from call-backs is also used to save the best weight during training. The final classifier layer with a soft-max consists of 5 output classes according to the KL grading system. The proposed model steps have been clarified:  The dataset consists of 5 folders normal, doubtful, mild, moderate, and severe X-ray images.  Change the size of the image to 300 widths and 224 heights.  Pre-processing input images by using ImageDataGenerator from Keras library to define the form for image data augmentation. If the dataset has a small number of images, it uses to generate more images by using rotation, zooming, horizontal and vertical flipping  Dividing the pre-processed images into training and validation. The training data is used to train a model, and the model will be tested by using validation data. Dataset was split into 90% for training and 10% for validation.  Change the structure of the network by adding 4 layers of a model by flatten layer, 2 FC layers, and a softmax layer for a classification output to 5 classes.  Used a checkpoint and early stopping function from call-backs in Keras library to get the best accuracy for the training model and saved it in a .h5 file.

Results and Discussion
In this section, the model results will be discussed and the results obtained from the system through deep learning techniques using transfer learning (VGG16, VGG19, and ResNet50) models will be presented. The experiments are performed in the environment with Windows 10 pro 64 bit, Intel core i7, and 8GB RAM. The code was written using python 3.8 programming languages on the Jupyter notebook. Figure 7 shows the confusion matrix was used to evaluate the model's accuracy. The proposed 3 pre-trained CNN networks(VGG16, VGG19, and ResNet50) that trained on some factor with a 0.0001 learning rate, data augmentation, and 0.2 dropout achieved the training and validation accuracy shown in Table    It is also clear that the last epoch (200) was validation accuracy with a limit of 85.45%, while as mentioned earlier that validation accuracy is 87.27%, due to the use of the checkpoint function from call-backs, where it stored all the weights that gave the highest and best validation accuracy at epoch 188. Figure 9 shows the accuracy and loss of the training and validation of the VGG19 network. Also, it can be noted that the training stopped at epoch No. 120, although the number of Training Epochs is 200, and that is due to the use of the early stopping function of the callbacks.
Also, the last epoch (120) was validation accuracy, with a limit of 86.66%. At the same time, we mentioned earlier that validation accuracy is 89.69% due to the use of the checkpoint function from call-backs, where it stored all the weights that gave the highest and best validation accuracy at epoch 98. Figure 10 shows the accuracy and loss of the training and validation of the ResNet50 network. After that, the validation accuracy ranges between 80% to 90% The last epoch (200) achieves validation accuracy with a limit of 87.27%. As mentioned earlier, validation accuracy is 91.51% due to the checkpoint function from call-backs, where it stored all the weights that gave the highest and best validation accuracy at epoch 197.
As a result of high accuracy obtained from ResNet50 model, this study proved its superiority over previous studied that adopted classification based on deep learning and machine learning as shown in Table 3.

Conclusions
In this study, CNNs (VGG16, VGG19, and ResNet50) models are successfully developed and used for the issue of OA diagnosis. In this paper, Deep learning techniques especially transfer learning are employed to detect and classify OA in knee joint from X-ray images. The proposed method achieves an automatic OA knee joint classification according to the KL scoring system using CNN with transfer learning. The proposed model gave the best accuracy by using a ResNet50 pre-trained network with fine-tuning 91.51% overall validation accuracy. Checkpoint and early stopping functions from the Keras library were successfully used to get the best accuracy, stop training if the accuracy did not improve, reduce the training time, and save the best weights for later use.