A Comparative Study of Probabilistic and Ensemble Learning for Liver Disease Diagnosis

Israa Mohammed  Hassoon

doi:10.24996/ijs.2025.66.4.25

Authors

Israa Mohammed Hassoon Department of Mathematics, College of Science, Mustansiriyah University, Baghdad, Iraq

DOI:

https://doi.org/10.24996/ijs.2025.66.4.25

Keywords:

Probabilistic Learning, Ensemble Learning, Liver Disease, Naïve Bayes, Logistic Regression, Extreme Gradient Boosting, Random Forest

Abstract

Early diagnosis of liver disease is extremely challenging because it lacks recognizable symptoms. When liver disorders are identified early, patients can start treatment before it’s too late, perhaps saving their lives. It is imperative to propose a preprogramming diagnosis model to avoid misdiagnoses or delayed diagnoses. This article aims to give a comparative analysis of probabilistic and ensemble learning, both of which have demonstrated efficacy in resolving real-world problems. There are four methods used in total: two probabilistic and two ensemble learning. A substantial liver dataset is employed, containing the records of 30691 individuals, 21917 of whom have liver disease and 8774 of whom do not. First, enough features are discovered by applying ten patient attributes. After preprocessing, 30% of the patient data is used for testing, and 70% is used for training. Then, Naïve Bayes, logistic regression, random forest, and extreme gradient boosting receive them. The parameters (such as gamma number, number of estimators, max-iter, max-leaf nodes, max-depth, etc.) for each of the four algorithms are established. Specificity, sensitivity, accuracy, and f1-score are the four quantitative evaluation parameters used to evaluate the performance of each model. The results obtained from the four models are compared with previous research and with each other. More lives would be saved as a result of this work's decreased rate of wrong diagnoses. The outcomes show that using and fine-tuning hyperparameters optimizes the model's performance. By combining the output of multiple weak models using ensemble methods, increased accuracy is achieved. By reaching a high accuracy of 100%, ensemble algorithms fared better than probabilistic approaches, which had an accuracy of 91.64%.