Breast Cancer Detection using Decision Tree and K-Nearest Neighbour Classifiers

Fatin Kadhim Nasser; Suhad Faisal Behadili

doi:10.24996/ijs.2022.63.11.34

Authors

Fatin Kadhim Nasser Department of Computer Science, College of Science, University of Baghdad, Baghdad, Iraq https://orcid.org/0000-0002-5959-6051
Suhad Faisal Behadili Department of Computer Science, College of Science, University of Baghdad, Baghdad, Iraq

DOI:

https://doi.org/10.24996/ijs.2022.63.11.34

Keywords:

Breast Cancer, Gini index, Entropy, confusion matrix, classification report

Abstract

Data mining has the most important role in healthcare for discovering hidden relationships in big datasets, especially in breast cancer diagnostics, which is the most popular cause of death in the world. In this paper two algorithms are applied that are decision tree and K-Nearest Neighbour for diagnosing Breast Cancer Grad in order to reduce its risk on patients. In decision tree with feature selection, the Gini index gives an accuracy of %87.83, while with entropy, the feature selection gives an accuracy of %86.77. In both cases, Age appeared as the most effective parameter, particularly when Age<49.5. Whereas Ki67 appeared as a second effective parameter. Furthermore, K- Nearest Neighbor is based on the minimum error rate, and the test maximum accuracy for K_value selection with an accuracy of 86.24%. Where the distance metric has been assigned using the Euclidean approach. From previous models, it seems that Breast Cancer Grade2 is the most prevalent type. For the future perspective, a comparative study could be performed to compare the supervised and unsupervised data mining algorithms.