Optimal Number of Clusters by Using Four Indexes

Hanin Haqi  Ismail; Tareef Kamil  Mustafa

doi:10.24996/ijs.2026.67.2.35

Authors

Hanin Haqi Ismail Computer Science, Collage of Science, University of Baghdad, Baghdad, Iraq
Tareef Kamil Mustafa Computer Science, Collage of Science, University of Baghdad, Baghdad, Iraq

DOI:

https://doi.org/10.24996/ijs.2026.67.2.35

Keywords:

clustering, K-means, machine learning, Elbow method, Silhouette score, Gap statistic, Davis-Bouldin index

Abstract

In data analysis, “Clustering” has emerged as a mechanism applied in machine learning to group analogous data points or objects together based on their features, attributes, or characteristics. Clustering attempts to detect underlying patterns or structures in data without prior knowledge of group labels. Many algorithms are used in clustering like K-means, one of the most widely used clustering algorithms whose performance depends on the initial point and the value of K. Most clustering techniques need to determine the number of clusters in the beginning. However, in most cases, predicting that value is a high computational cost task. In this paper, an algorithm is designed to compute the proper number of dataset clusters using various cluster validity indexes. The most popular CVIs (clustering validation indexes) are: Elbow method, Silhouette, Gap statistic, and Davis-Bouldin. The paper also proposes a new technique for estimating the appropriate number of clusters (k) depending on their indexes and ranks. The best result of the (ONC) algorithm obtained by the average of silhouette is: (0.501).