Application of Data Science Techniques and Machine Learning based classifiers for Transformer Health Assessment

Sushma Sagar  Emme; Pratapa Raju  Moola

doi:10.24996/ijs.2025.66.2.30

Authors

Sushma Sagar Emme Computer Science and Multi Media, Lincoln University College, Petaling Jaya, Malaysia
Pratapa Raju Moola Engineering Department, University of Technology and Applied Sciences, Ibra, Oman

DOI:

https://doi.org/10.24996/ijs.2025.66.2.30

Keywords:

Dissolved Gas Analysis, Exploratory Data Analysis, Support Vector Machine, Random Forest, XGBoost, k-Nearest Neighbours

Abstract

Data Science and Machine Learning have been playing a major role in assessing, predicting and maintaining the health of power transformers using data analysis. This paper focuses on leveraging data science techniques to analyze and interpret Dissolved Gas Analysis (DGA) datasets associated with power transformers to predict Health Index (HI). The Exploratory Data Analysis (EDA) involving the correlation matrix and heat maps showed the correlation among all the features and indicated that the dataset considered is not balanced; hence, the data balancing technique of oversampling is employed to balance the data. Principle Component Analysis (PCA) is used to estimate the principal components of the data, helping in selecting the features that are most useful in the prediction. Classifiers, namely Support Vector Machine (SVM), Random Forest (RF), XGBoost, and k Nearest Neighbors (KNN), are employed on both the balanced data as well as the imbalanced data, and the results were compared. RF classifier outperformed all the other classifiers with an accuracy of 96.9%.