A Survey Study on Proposed Solutions for Imbalanced Big Data

Authors

  • Shaymaa Ahmed Razoqi Department of Computer Science, College of Education for Pure Science, University of Mosul, Mosul, Iraq https://orcid.org/0000-0002-7822-1378
  • Ghayda A.A. Al-Talib Department of Computer Science, College of Computer Sciences and Mathematics, University of Mosul, Mosul, Iraq

DOI:

https://doi.org/10.24996/ijs.2024.65.3.37

Keywords:

Imbalanced Data, Machine Learning, Resampling methods, Classifier Performance metrics, Ensemble classifiers

Abstract

     Learning from imbalanced data has been a focus of studies for more than two decades of continuous development. Training data is considered imbalanced when the size of the positive (minority) class is neglected because of the large size of the negative (majority) class, in addition to the problem of deviating distributions of binary tasks. The appearance of big data brings new problems and challenges to the imbalance problem. Big Data announces the challenges with 5V: volume, velocity, veracity, value, and variety. This study relied on dividing the solution to the problem of data imbalance into three levels: data level, algorithm level, and hybrid approaches. First, the standard solutions for this problem that were proposed were mentioned, and in addition, the most important metrics adopted for measuring the classification efficiency of imbalanced data were identified. In this survey study, 27 studies were reviewed during the period 2015–2022, distributed according to the levels of treatment of the imbalance problem. They also reviewed the performance metrics that were used in these studies and the sources of the datasets to which these solutions were applied. The study makes it easier for researchers and scholars to see the solutions to addressing the problem of data imbalance and the hybrid approaches recently used for that, and to take advantage of them in improving the classification process.

Downloads

Published

2024-03-29

Issue

Section

Computer Science

How to Cite

A Survey Study on Proposed Solutions for Imbalanced Big Data. (2024). Iraqi Journal of Science, 65(3), 1648-1662. https://doi.org/10.24996/ijs.2024.65.3.37

Similar Articles

1-10 of 1363

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)