Application of Data Mining and Imputation Algorithms for Missing Value Handling: A Study Case Car Evaluation Dataset

Authors

  • Wahyu Widyananda Electrical Engineering Department, Brawijaya University, East Java, Indonesia https://orcid.org/0000-0003-2550-4240
  • Muhammad Fauzan Edy Purnomo Electrical Engineering Department, Brawijaya University, East Java, Indonesia
  • Muhammad Aswin Electrical Engineering Department, Brawijaya University, East Java, Indonesia
  • Panca Mudjirahardjo Electrical Engineering Department, Brawijaya University, East Java, Indonesia
  • Sholeh Hadi Pramono Electrical Engineering Department, Brawijaya University, East Java, Indonesia

DOI:

https://doi.org/10.24996/ijs.2023.64.5.32

Keywords:

C5.0, k-NNI, Data Mining, Missing Value Handling, R Studio

Abstract

     Data mining is a data analysis process using software to find certain patterns or rules in a large amount of data, which is expected to provide knowledge to support decisions. However, missing value in data mining often leads to a loss of information. The purpose of this study is to improve the performance of data classification with missing values, ​​precisely and accurately. The test method is carried out using the Car Evaluation dataset from the UCI Machine Learning Repository. RStudio and RapidMiner tools were used for testing the algorithm. This study will result in a data analysis of the tested parameters to measure the performance of the algorithm. Using test variations: performance at C5.0, C4.5, and k-NN at 0% missing rate, performance at C5.0, C4.5, and k-NN at 5–50% missing rate, performance at C5.0 + k-NNI, C4.5 + k-NNI, and k-NN + k-NNI classifier at 5–50% missing rate, and performance at C5.0 + CMI, C4.5 + CMI, and k-NN + CMI classifier at 5–50% missing rate, The results show that C5.0 with k-NNI produces better classification accuracy than other tested imputation and classification algorithms. For example, with 35% of the dataset missing, this method obtains 93.40% validation accuracy and 92% test accuracy. C5.0 with k-NNI also offers fast processing times compared with other methods.

Author Biography

Muhammad Aswin, Electrical Engineering Department, Brawijaya University, East Java, Indonesia

iii

Downloads

Published

2023-05-30

Issue

Section

Computer Science

How to Cite

Application of Data Mining and Imputation Algorithms for Missing Value Handling: A Study Case Car Evaluation Dataset. (2023). Iraqi Journal of Science, 64(5), 2481-2491. https://doi.org/10.24996/ijs.2023.64.5.32

Similar Articles

1-10 of 2004

You may also start an advanced similarity search for this article.