Least Squares Estimations for the General Linear Model Parameters with Epsilon Skew Normal Error Term

Examination of skewness makes academics more aware of the importance of accurate statistical analysis. Undoubtedly, most phenomena contain a certain percentage of skewness which resulted to the appearance of what is -called "asymmetry" and, consequently, the importance of the skew normal family . The epsilon skew normal distribution ESN (μ, σ, ε) is one of the probability distributions which provide a more flexible model because the skewness parameter provides the possibility to fluctuate from normal to skewed distribution. Theoretically, the estimation of linear regression model parameters, with an average error value that is not zero, is considered a major challenge due to having difficulties, as no explicit formula to calculate these estimates can be obtained. Practically, values for these estimates can be obtained only by referring to numerical methods. This research paper is dedicated to estimate parameters of the Epsilon Skew Normal General Linear Model (ESNGLM) using an adaptive least squares method, as along with the employment of the ordinary least squares method for estimating parameters of the General Linear Model (GLM). In addition, the coefficient of determination was used as a criterion to compare the models’ preference. These methods were applied to real data represented by dollar exchange rates. The Matlab software was applied in this work and the results showed that the ESNGLM represents a satisfactory model.


Introduction
The families of Location and Scale play a vital role in modeling and statistical analysis. This reflects the importance of skew distributions in this field. Skew distributions were proposed for obtaining distributions that are capable of adapting as much as possible with real data [7]. The basic idea in the analysis of distance from the presumed location of symmetrical distributions is to obtain a family with an additional parameter where this parameter represents the skewness degree [2]. The normal assumption is not achieved in most of the studied phenomena because of the skewness, which may be a key reason for the weak regression model. Several methods, including the logarithmic data taking method, Box-Cox conversions and others, have been used to address this weakness. However, these methods are not sufficient in many applications, especially the biological ones [3]. In such a case, the alternative method is to provide distributions that are consistent with the data to gain more flexible models to adapt to each specific skewness percentage [4].
There are various methods of linear regression estimation and perhaps the most prominent and basic method in statistics is the Ordinary Least Squares (OLS). This method is based on the assumptions that must be available for random error (U i ), which should be ordinarily distributed with an average of 0 and a variance of σ 2 . Since the traditional estimation methods are sensitive to outliers that are many researcher who have treated this by deleting or estimating them based on other obserations and this reflects negatively to reduce the reliability of the description of the studied phenomenon. In case of keeping these values, the used estimation methods should be adapted to error distribution that is consistent largely with the data. Such methods are characterized by having a sufficient capacity to handle outliers to data, and thus obtaining more efficient, accurate, realistic, and logical results. The main problem of this research is represented within the normal distribution which is not always tenable as a good fit for modeling the data, especially when the data are skewed and containing outliers. ESN is useful for modeling skewed data and controlling outliers from right and left sides of the distribution curve.
The purpose of this paper is to estimate the parameters of ESNGLM using the adaptive least squares (ALS) method as well as the parameters of GLM using the ordinary least squares (OLS) method. We also compare the models' preference using the coefficient of determination (R 2 ).
Skew distributions appeared for the first time in 1897 by Fechner [5] who demonstrated that a skew distribution can be obtained by connecting two half-normal distributions that are different in terms of the measurement parameter (σ). In 1976, O'Hagan and Leonard [6] presented the Skew-Normal (SN) distribution in Bayes' analysis. In 1985, Azzalini [7] worked on investigating thoroughly the skew normal (SN) distribution and studying the properties of its Probability Density Function (PDF), which is as follows: ( ) ( ) ( ) ( ) where ( ) represents PDF for a standard normal distribution and ( ) stands for the Cumulative Distribution Function (CDF) for the standard normal distribution, while represents the skew parameter whose value is between . Articles of Caudill [8] in 1993, Polachek and Robst [9] in 1998, as well as Louis, Blenman, and Thatcher [10] in 1999 included some other applications for skew normal distribution (SN) to the real estate, labor and financial markets, respectively. In 2000, Mudholkar and Huston [17] presented the Epsilon Skew-Normal Distribution (ESN). In 2010, Dey and Debarshi [72] suggested a method to calculate estimations of the skew normal depending on the ratio between the PDF and CDF for normal distribution using two types of linear and non-linear equations. The results showed that a linear equation produces satisfactory results. In 2013, Abdulah and Elsalloukh [13] studied the estimations of Epsilon Skew Gamma (ESΓ) and their application to data distributed in ESΓ [74]. In 2015, Mudholkar et al. [3] introduced the M-Gaussian (M-G) distribution for a right-skewed data set. It was with two parameters, with representing the mode of  (Azzalini, 1985) based on the symmetric component normal distribution, which they called as the Skew Symmetric Component Normal (SSCN) distribution.
In 2018, Yalçınkaya and his colleagues [76] used the Genetic Algorithm (GA) to find estimations of probability function for the skew normal (SN) distribution parameters and compared them with other iterative techniques, such as Newton-Raphson (NR), Nelder-Mead (NM), and Iteratively Re-Weighting Algorithm (IRA). They proved that Genetic Algorithm is the most efficient compared with other algorithms by calculating mean squares errors (MSE) for each estimator if applied in each algorithm. In 2019, Huston et al. [17] proposed a generalization of the log-normal distribution called the Log-Epsilon-Skew-Normal (LESN) distribution. They studied the main properties of LESN, such as the hazard function, moments, skewness and kurtosis coefficients. They used the maximum likelihood method for estimating the LESN parameters.
This paper is organized as follows. In section 2, we show the ESN distribution with some properties. In section 3, we introduce the ESNGLM model. In section 4, we demonstrate the adaptive least squares estimates. In section 5, we explain the numerical algorithm Trust-Region-Dogleg. In section 6, we describe, plot, and test the real data. In section 7, we obtain the estimation results of ESNGLM parameters, and in section 8 we focus on some important conclusions.

The Epsilon Skew Normal Distribution
Mudholkar and Huston [11] proposed the skew normal distribution in a formula that differs from that of Azzalini [7], as described in equation (1). They called it Epsilon Skew Normal (ESN) distribution whose probability density function is as follows: Where , σ and ε represent parameters of location, scale and skewness, respectively [11]. In this research, the probability density function (2) is used as a function of the random error distribution for the linear regression model as it has the average and variance as follows: (4) Figure-1 demonstrates the PDF for ESN distribution by choosing more than one value for ε. It also demonstrates that if ε > 0, this means that there is right skewed, which is evident in the slope of the black curve to the left compared with the blue curve which represents the normal distribution. In this case, skewness is positive, i.e. the data are concentrated mostly on the right side and the mean is greater than the medium and the mode (the mean is located to the right of the mode). If ε < 0, this means there is left skewed, which is evident in the slope of the red curve to the right compared with normal distribution, and is considered a negative skew, i.e. data are concentrated mostly on the left side and the mode is greater than the medium and the mean (the mean is located to the left of the mode). This shows that the skewed side has a longer tail than the other side [17]. Figure-   For the CDF of ESN standard distribution (0, 1, ε) [17], it is as follows: Note that (x) refers to the CDF of the standard normal distribution. Figure3 illustrates the effect of ε on the form of the CDF of ESN distribution by choosing more than one value for ε. Hence, the blue curve shows the form of the CDF of normal distribution (ε = 0), while the red curve shows the CDF of the right skewed ESN distribution, which is manifested through the effect of an ε value that is equal to 0.5 because the curve in the right side has a longer tail than the other side. The black curve illustrates the CDF of left skewed ESN distribution. This is clear through the effect of an ε value that is equal to 0.5 because the curve in the left side has a longer tail than the other side.

Epsilon Skew Normal General Linear Model (ESNGLM)
The relationship between the dependent variable (Y i ), a number of independent variables (X i ) and the random error (U i ) can be represented using the following linear equation: (6) Where Y is a vector of the dependent variable observers with the dimension (n× 1), X is a matrix with the dimensions (n× (K + 1)) of independent variable observers, and β is a vector ((K + 1) x 1) of regression parameters. Note that the first element of β represents the constant term, and U is a vector (n× 1) of random error which is distributed with Epsilon skew normal U ~ ESN (μ, σ, ε). When an error term of a regression model is distributed with ESN distribution, its mean value is as stated in equation (3). Since error expectation is a constant term, it must be multiplied by a unit vector ( ) estimated, because in this case, we will deal with a vector of random errors. Hence, the estimation formula of model (6) would be as follows:

Adaptive Least Square Estimates
The sensitivity of the traditional methods towards long-tail is very normal [18]. Hence, the work on adaptation of the least squares method in case of asymmetric error distribution (skew) is achieved by considering that the distribution of error is ESN (μ, σ, ε) and therefore providing greater immunity to the model [19]. This method acts through reducing the deviance residual sum of squares ∑ to the possible maximum extent. In order to obtain adaptive least squares (ALS) estimates for ESNGLM, the following should be applied: Since ALS estimates for ESNGLM are based on matrices and vectors, we get a sum squares error as follows: By substitution ( ) … (7) we get: (10) Where Y'I = I'Y, (X )'I = I'X , (X )'Y = Y'(X ). Taking differentiation of (10) with respect to , μ, σ and ε, respectively there will be: By equalizing all of the equations (11), (12), (13) and (14) to 0, the estimated formulas for , μ, σ, and ε are as follows: (18) Through reviewing the estimated formulas for the ESNGLM parameters (18), in particular the estimated formulas for the parameters μ, σ, and ε, it is found that each of these estimates is a closed form in the sense that its explicit formula can only be obtained and resolved by using numerical methods of estimation [76] .

Trust-Region-Dogleg Algorithm
This algorithm is used to solve linear and non-linear equations numerically using Newton's method and relying on Tyler's chain by reducing error to the possible minimum iteratively with 600 attempts [20]. Jacobian's matrix represents the partial derivative for each parameter of ESNGLM, as shown in the following: (20) Note that k is the number of iterations, and d K represents the amount of change between the current estimated value (k + 1) and the former (k) for each parameter. As for the vector of partial derivative for each parameter (Jacobian), it is as follows: Where J (X k ): partial derivative vector of the model parameters to be estimated. F (X k ): vector of the parameters to be estimated.
( ) partial derivative of the parameter to be estimated. Figure 4 illustrates the mechanism of this algorithm:

Application
This research examined data obtained from the Iraqi Central Bank [22]. These data included three variables; the first one represents the dependent variable (Y i ), which is the exchange rate of dollar by months during (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016). In general sense, the exchange rate means the price for one unit of foreign currency to local currency, such as dollar versus dinar. The second variable is represented by the first explanatory variable (X i1 ) which refers to inflation rates, because economic inflation means the continuous rise in prices of imported goods and that the rise in inflation rates leads to reducing the value of the local currency against foreign currencies. This, in turn, leads to increasing the exchange rate for the foreign currency (dollar). The third variable is the second explanatory variable (X i2 ) which refers to the ratio of currency rates to money supply which represents the amount of money in circulation among people in the society. For the purpose of determining whether the dependent variable data are distributed by their assumed distribution, that is ESN, Kolmogorov-Smirnov test was conducted using MATLAB software [20]. This test is based on two assumptions: 1. Null hypothesis which indicates that the sample data follow ESN distribution.
2. Alternative hypothesis which indicates that the sample data do not follow ESN distribution.
The test was conducted through calculating the values of P and H. The P-value represents the probability of data appropriateness for their supposed distribution (ESN) while the H-value represents the selection of assumption. The test's result was that the P-value is 0.3879, which is greater than 0.05. The H-value is 0 and this means that the H 0 assumption cannot be rejected and the data are following ESN distribution. After arranging the data of the dependent variable through the use of Stem and Leaf method, they were found to be Right Skewed Data because they have a long tail to the right side compared to the other side, as shown in Figure-5.

Estimation Results of Epsilon Skew Normal General Linear Model
This section calculates the estimated values of the Epsilon Skew Normal General Linear Model (parameter using the ALS method. It also includes the estimation of the General Linear Model parameter using the OLS method to compare the models preference using the R 2 . After using MATLAB software and applying numerical algorithm Trust-region-dogleg, the estimated values of ESNGLM and GLM along with the value of R 2 for each model are shown in Table-1. The reasonable initial values required for the estimation of the parameters (μ, σ, ε) are provided via the least square estimators (symmetric case ε = 0) and parameter statements which correspond to parameter bounds (-1 < ε < 1) [17]. Some authors have been going beyond using the medians as initial values if the goal is to have a single value that reflects the location of most observations in skewed distributions [23]. Values of the standard error (SE) for each parameter of the two models are shown in Table-  Based on table 1, it is observed that ESNGLM has a value of the coefficient of determination that is greater than its value for GLM, which in turn makes the linear model (ESNGLM) with an error distribution ESN that is better than that for the linear model (GLM) with normal error distribution. Thus, ESNGLM represents a satisfactory model to a large extent. This also means that the independent variables (X i1 ) and (X i2 ) can interpret 0.9974 of the changes that occur with the dependent variable ( ̂ ) for ESNGLM. By observing equation (13), it is found that the inflation rates (X i1 ) affect the dollar exchange rate ( ̂ ) by 6.8138. In other words, if X i1 is increased by one unit with a constant of X i2, then ̂ will increase by 6.8138. As for the ratio of currency to the money supply (X i2 ), they affect the exchange rate ( ̂ ) by 4.8515. In other words, if X i2 is increased by one unit with a constant of X i1 , then ̂ will increase by 4.8515. Also, table 1 shows the equality of the estimated values of each of the regression parameters (b 1 , b 2 ) in both ESNGLM (ALS ESN ) and GLM (OLS N ). This explains the equivalence of the estimated formula for each of these parameters in both models. Through reviewing Table-2, it is found that the SE for parameters of ESNGLM is close to zero, which indicates the accuracy of the estimation.

Conclusions
In this paper, some estimations of the ESNGLM parameters were calculated using the ALS method, in addition to the calculation of those of GLM parameters using the OLS method, relying on the principle of minimizing the sum of error squares to the possible minimum for each parameter.