Smart Doctor: Performance of Supervised ART-I Artificial Neural Network for Breast Cancer Diagnoses

Wisconsin Breast Cancer Dataset (WBCD) was employed to show the performance of the Adaptive Resonance Theory (ART), specifically the supervised ART-I Artificial Neural Network (ANN), to build a breast cancer diagnosis smart system. It was fed with different learning parameters and sets. The best result was achieved when the model was trained with 50% of the data and tested with the remaining 50%. Classification accuracy was compared to other artificial intelligence algorithms, which included fuzzy classifier, MLP-ANN, and SVM. We achieved the highest accuracy with such low learning/testing ratio.


Introduction
Breast cancer is one of the most common types of cancers among women nowadays. It manifests in the form of a non-uniform tumour, but not all tumours are cancerous. Non-cancerous tumours are called benign while cancerous tumours are called malignant. Diagnosing tumours in the breast tissue as benign or malignant as early as possible is very important as it highly increases the chance of ISSN: 0067-2904

AL-Rawi and AL-Rawi
Iraqi Journal of Science, 2020, Vol. 61, No. 9, pp: 2385-2394 2386 surviving. A general physician will have to conduct the diagnoses when a specialist is not available. Hence, it is better to employ an artificial intelligent system, which is learned with the knowledge of experts, for the aid in diagnoses.
The main objective of this paper is to investigate the performance of the ART-ANN in the whole space of the learning parameters. To be specific, we will measure the performance of Supervised ART-I. We also aim at building an artificial intelligence system for Breast Cancer Diagnoses (BCD) based on the original WBC dataset.
The rest of this paper is organized as follows: section two provides a background on the classification of breast cancer. The description of the WBCD dataset is shown in section three. The Supervised ART-I algorithm is listed in section four. The performance of the supervised ART-I is shown in section five. The conclusions and discussion are presented in section six.

Description of WBCD
The WBC dataset contains 699 instances with 9 features and a class label (benign or malignant). Furthermore, the score of each feature is an integer value between 1 and 10. These features, according to their order in the dataset, are clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, and mitoses. These features were driven from breast tissue using Fine Needle Aspirates (FNA) from women with breast tumours in order to diagnose the tumour as benign or malignant [1]. The dataset contains 16 instances with missing data. The remaining 683 instances are the scope of this paper. Specifically, it contains 444 benign and 239 malignant instances [1].

Supervised ART-I ANN
Any ANN should go through two phases, the learning phase and the testing phase. For this, the data is divided into two parts. The first part is used for learning while the second part is used for testing. At the end of the learning phase the weights that connect the input nodes with the category nodes must be determined. Then these weights are used during the testing phase in order to classify the second part of the data. The features and the class code, in a supervised form, are introduced to the supervised ANN during the learning phase. However, only the features are introduced to the ANN during the testing phase. It is the ANN's task to assign the class.
For all ART ANNs, the features are introduced to the ART-ANN in a normalized form between zero and one. Such approach has two advantages. First, the initial values for the weights are set to one. Second, the complements (1normalized feature value) can be introduced together with their normalized features as well. ART ANNs has two parameters that need to be optimized for a specific problem, which are the vigilance parameter ρ and the learning parameter β.
There are many supervision approaches for ART-ANN, including the Mapfield approach as in ARTMAP [25] and Fuzzy ARTMAP [26], the Tagging approach as in Supervised ART-I [27], and the Bagging approach as in Supervised ART-II [28]. All these supervisions have the same classification accuracy from the theoretical aspect. However, Tagging and Bagging approaches are better than Mapfield approach from a memory requirement and execution time points of view. In addition to that, their architectures are simpler.

AL-Rawi and AL-Rawi
Iraqi Journal of Science, 2020, Vol. 61, No. 9, pp: 2385-2394 2387 ART ANNs do not fall in local minima, as other approaches do, because they always converge. The weights are strictly decreasing during the learning phase. This can be clearly seen from the weight learning formula: is the weight between the input node i and the winning category node J, and is the ith feature in the feature vector. The weight decreases when , otherwise the weight does not change.
From the computational aspect, the Supervised ART-I requires less learning and testing time than the Supervised ART-II when there are less than 1000 Committed Category Nodes (CCNs) during learning [29]. The number of CCNs is bounded by the learning size. However, it is normally much less than that depending on the vigilance parameter ρ= (0, 1] and learning parameter β= (0, 1]. Since the total number of WBCD instances is less than 1000, the Supervised ART-I will be employed for this task. Its architecture is shown in Figure-1. The left array of the nodes represents the input features a i ; i = 1, …, 2M, of a single instance with the class code "b". The right array of nodes represents the category nodes N. Only CCNs are assigned tags and weights that connect them with the input node [27].

Supervised ART-I Learning Algorithm
The learning algorithm of the Supervised ART-I is as follows: 1 For the next input of the learning dataset, introduce the normalized instance and its complement ; i = {1, …, 2M}, and the class code "b". Reset the value of the vigilance parameter to its initial value, . 5.
Compute the score , where j corresponds to a CCN, using Eq. 4 for each CCN: 6.
Find the max score among all CNN scores using Eq. 5: 7.
If then a new category node must be committed and n is incremented. ( ) then put CCN "J" in shut off mode, , and assign the new value for using Eq. 9: ∑ ( ) (9) go to step 6.
If there are more input features go to step 4.

12.
Save all CCNs with their Tags and weights.

13.
End of the learning phase.

Supervised ART-I Testing Algorithm
The testing algorithm of the Supervised ART-I is as follows: 5. Find the max score using Eq. 5: ( ) (5) 6. Assign Tag(J) as the class of the current features, ( ). 7. If there is more data to be classified go to step 3. 8. End of the testing phase.

Performance of the Supervised ART-I
The examination of the performance of the Supervised ART-I ANN is presented using the wellknown WBC dataset benchmark. Specifically, the aim of the classification is to distinguish between benign and malignant patients according to the nine input features of WBC. We introduce a 1 , a 2 , …, a 9 normalized to [0, 1] together with their complements 1-a 1 , 1-a 2 , …., 1-a 9 and class code (benign=1, malignant=2) to the Supervised ART-I ANN during the learning phase.

AL-Rawi and AL-Rawi
Iraqi Journal of Science, 2020, Vol. 61, No. 9, pp: 2385-2394 2389 The performance is measured with different vigilance ρ and learning β parameters. Furthermore, the size of the learning set is 341, which represents about 50% of the total WBC dataset, while the remaining data is used for testing.
Learning is performed using every combination of the values of the vigilance parameter ρ = [0.01, 1] and the learning parameter β = [0.01, 1] with a step size of 0.01. Furthermore, the performance of both normal and fast modes is investigated. This represents a total of 20,000 different runs, which aims to show the performance of the Supervised ART-I ANN across the whole space of the ρ and β parameters.
The classification performance for learning in normal mode with any combination of ρ and β is more than 98%. Furthermore, the number of CCNs is shown in Figure-2 as a contour plot across the entire space of ρ and β. We achieved 99.71% accuracy with ρ = 0.82 and β = 0.85. Furthermore, the accuracy for the malignant class is 100%, while the accuracy for the benign class is 99.62%, with only one miss-classified feature vector of the 262 benign feature vectors, as shown in Table-1. Moreover, the learning time for this run was 17 ms, while the testing time was 10 ms. The classification performance for the fast learning mode is slightly less than that for the normal mode in terms of accuracy achieved. Specifically, it is more than 97% for the whole space. Moreover, the total number of CCNs is shown in Figure-3. It can be observed in the normal mode that the number of CCN is higher when the values of β or ρ are high. The best overall accuracy in the fast mode is 99.42% at ρ = 0.65 and β = 0.51 with 33 CCNs. Specifically, the accuracy for the benign and malignant classes is 99.23% and 100%, respectively, as shown in Table-1. Furthermore, the learning time for this run was 0.011 sec, while the testing time was 0.006 sec. Learning with just 25 instances, an accuracy of more than 90% is achieved with any combination of ρ and β. In particular, the highest accuracy for normal mode is 96.50%. Furthermore, learning in the normal mode with five epochs increased the overall accuracy to 96.81%. However, learning in the fast mode increased the accuracy to 97.26%, as shown in Table-2. Moreover, the resulting network consists of only three category nodes. Figure-4 shows the complete Supervised ART-I ANN with the values of the weights for each of the three category nodes and the 18 input nodes     Fuzzy-AIS-kNN 99.14 10-fold cross validation [9] FW-KNNI 96.04 10-fold cross validation [10] Voting classifier (Naïve Bayes + SVM + J48) 97.13 10-fold cross validation [11] J48 and MLP with PCA 97.57 10-fold cross validation [12] Least square SVM

Discussion and Conclusions
All previous works trained their systems with more than or equal to 50% of the data. In particular, only two works [3,12] trained their systems with 50% of the data. In a previous study [3] an accuracy of 97.36% was achieved using fuzzy-generic classifier, while another work [12] achieved 95.89% accuracy using least square (LS) SVM classifier. A better accuracy was achieved in this work with the same learning size. However, two works achieved better accuracy [15,20] but with a learning size of more than 50%. In particular, an accuracy of 99.74% was achieved [15] with 10-fold cross validation using parallel time variant particle swarm optimization (PTVPSO) for parameter optimization and feature selection for SVM classifier. While an accuracy of 100% with 80% learning size was achieved using Rough Set (RS) and Extreme Learning Machine (ELM) classifier [20]. A comparison of this work with previous ones is shown in Table-3. It is important to note how fast the Supervised ART-I can execute the learning and testing phases. In particular, the learning and testing time is not longer than 200 ms for any single run. For this, it was able to perform the learning and testing for all the 20,000 different combinations of the learning parameters ρ and β.
The classification accuracy for the majority of the runs for the whole space of ρ and β, and different learning size, was higher than 90%, which indicates the high performance of the Supervised ART-I classifier. Furthermore, the best accuracy was 99.71% at ρ=0.82 and β=0.85 learning in the normal mode. In this run, only one benign instance was miss-classified as malignant by the system while correctly classifying all the other instances. Moreover, the time for the best accuracy run is 17 ms for the learning phase and 10 ms for the testing phase.
It can be seen that as ρ increases, more CCNs are generated. This happens because the matching criteria for the CCNs are higher. Moreover, CCNs is proportional with β as well. Furthermore, the Supervised ART-I can perform very well, even with a very small learning size, achieving 96.50% accuracy in the normal mode. The performance is improved further using the fast learning mode, achieving 97.26% accuracy. The increase in accuracy for the fast mode, learning with such small size, is due to the assignment of the feature values to the weights when a node is committed. However, for the normal mode, the weight value is between the features values and 1 when a node is committed. When β is higher, the weights are closer to the values of the input features. Thus, there are not enough training instances to decrement the weights to their optimal values.