A Comparison of Different Estimation Methods to Handle Missing Data in Explanatory Variables
Missing data is one of the problems that may occur in regression models. This problem is usually handled by deletion mechanism available in statistical software. This method reduces statistical inference values because deletion affects sample size. In this paper, Expectation Maximization algorithm (EM), Multicycle-Expectation-Conditional Maximization algorithm (MC-ECM), Expectation-Conditional Maximization Either (ECME), and Recurrent Neural Networks (RNN) are used to estimate multiple regression models when explanatory variables have some missing values. Experimental dataset were generated using Visual Basic programming language with missing values of explanatory variables according to a missing mechanism at random general pattern and some ratios of missing values (10%, 20%, and 30%) with error variance values of 0.5, 1. 5, and 2, which were included in sample sizes of 25, 50, 100, and 500 and evaluated using Mean Squared Error (MSE). Simulation results show that RNN outperforms the other methods, followed by EM at small sample sizes.