A New Bayesian Group Bridge to Solve the Tobit Model

In this paper, we propose a new approach of regularization for the left censored data (Tobit). Specifically, we propose a new Bayesian group Bridge for leftcensored regression ( BGBRLC). We developed a new Bayesian hierarchical model and we suggest a new Gibbs sampler for posterior sampling. The results show that the new approach performs very well compared to some existing approaches.


1-Introduction
Left censored regression is a statistical method in which the observed response variable is censored from below. Examples of such data are various and cover many different areas, such as agriculture, genetics, environment and medicine, etc. Left censored regression is considered as one of the good methods for assessing the correlation between a set of explanatory variables and a dependent variable. One of the important left censored regression problem is when explanatory variables are very large. Therefore, it is difficult to identify important variables. Most research focuses on variable selection to obtain the appropriate model. Traditional methods of variable selection include Mallow's , suggested by Mallows [1]: is the mean square error of the model, is a residuals sum of squares, is a number of co-variates in the model, and is the sample size of data. The small values of mean that the model is relatively accurate. Woodroofe [2] showed that selects the conservative model. Nishii [3] showed that is inconsistent in selecting the right model, and often selecting a larger model when

Aljanabi and Alhamzawi
Iraqi Journal of Science, 2020, Special Issue, pp: 215-222 Akaike [4] proposed Akaike Information Criterion (AIC), which is defined by where is a maximum likelihood function (MLE). Javed and Mantalos [5] showed that the selected model using AIC is inconsistent when the sample size is large. For the sake of eliminating this issue, Schwarz [6] presented the Bayes information criterion (BIC) This method overcomes the problem of (AIC) and selects a model with good properties. However, when this method cannot handle the problem of variable selection. George and Mc-Culloch [7] presented a stochastic search variable selection (SSVS) as an attractive way to select a subset of covariates using a mixture of prior distributions that allows some coefficients equal to zero One of the disadvantages of this approach is the long time to select the correct model (i.e., it is time consuming). In addition, in high dimensional data, the algorithm cannot visit the correct model.
Recently, regularization methods became more popular because they simultaneously select and estimate the important coefficients; see for example Hans [8], Li et al. [9] Alhamzawi and Yu [10], Tibshirani [11] Liu et al. [12], Alhamzawi [13]. Mallick and Yi [14], Xu and Ghosh [15], and Alhamzawi and Ali [16]. The general formula of the regularization methods is as follows ̂ where is a function of the model coefficients which controls the degree of penalty in terms of tuning parameter . Hoerl and Kinnard [17] proposed the Ridge regression which has a better predictive performance than (OLS) estimates, with a lower variance. However, the Ridge regression cannot produce an optimal model, because it always retains all predictors in the model. Frank and Friedman [18] suggested that the Bridge regression has attractive features such as Oracle, unbiasedness, as well as the variable selection and parameter estimation of the model, but the approximate covariance matrix and bootstrap calculated standard errors are unsteady.
Tibshirani [19] proposed the Lasso regression which automatically selects the important variable by shrinking some unimportant coefficients to zero.
In recent years, researchers focused on selecting influential groups of variables. Yuan and Lin [20] proposed a group Lasso, which was expanded by Kim et al. [21] to general loss functions. The group lasso regression cannot select a binary variable.
Huang et al. [22] proposed a group bridge regression, which is capable of selecting a bi-level variable with oracle property and sparsity [23,24].
Aljanabi.S and Alhamzawi [25] )Accepted paper ( proposed a new Bayesian group lasso in leftcensored regression models for the simultaneous variable selection and parameter estimation , where the results of data analysis and simulation showed that the proposed method performed better than the other approaches.
In this research, we propose a new Bayes group bridge for left-censored data. Then, a new Gibbs sampler algorithm for variable selection is implemented. Simulation researches and real data analysis show that the new approach's performance is very well in comparison to the existing methods.
In Section 2, we provide an overview of the left-censored model. In section 3, we describe the Bayesian group bridge regression for left-censored data and present a new Bayesian hierarchical model, In Section 4, we carry out Monte Carlo simulations to demonstrate the performance of the proposed method. In section 5, we analyze the Real data and in section 6 we draw the conclusions.

Methods
Consider the left-censored model where is a left censored point, .

Bayesian Group Bridge For Left-censored model
Huang et al. [22] suggested that the bridge group is able to select the important groups and select within each group, where and denote the parameter concavity. Entering multiple parameters will aggregate the information between variables within the group and accommodate the shrinkage through specific parameters. Despite the good and desirable characteristics of this technique, it does not provide correct or reliable standard errors [22]. The Bayesian approach overcomes these disadvantages and can provide standard errors. Following Huang et al. [22], the Bayesian group bridge for censored data can be written as: where , is the number of the groups and ‖ ‖ is the of .We will use scale mixture of uniforms (SMU) for representing the generalized Gaussian (GG) prior, making the Markov Chain Monte Carlo (MCMC) algorithm work with good computational efficiency. The conditional GG prior distribution of is given by Mallick and Yi [26], as follows The most important step in the Bayesian approach is to determine the prior distribution of parameters. It is also of great importance that the selection must be accurate because the opposite will lead to many problems, as previously shown by Kenny and Donson [27], Alhamzawi and Yu [28], and Alhamzawi and Ali [29]. Following Mallick and Yi [26], to perform the Bayesian analysis, we set the next prior distribution of as follows; where is the normalizing constant.
( ) In the present study, we convert the above formula in the following manner: Let then

Hierarchical Representation
We construct our Bayesian hierarchical model following the hierarchical model of Mallick and Y, [26], as follows:

4-The full conditional posterior distribution of is
( )

5-The full conditional posterior distribution of is
where I(.) is an indicator function.

Posterior Computation
In the section, following Mallick and Yi [26], we develop a Gibbs sampling algorithm to update the latent variables and the other parameters, according to the following steps: ii. Generate from the full conditional distribution,

‖ ‖
iii. Generate from the multivariate normal distribution with mean ( and the variance is ( ( ) ) iv. Generate from the Inverse Gamma distribution with the shape parameter and the rate parameter .
v. Generate from Gamma distribution, with the shape ( ) and rate ( ).

Simulation Study
Here, we carry out Monte Carlo simulations to demonstrate the performance of the proposed method for Bayesian group bridge regression for left censored data (BGBRLC) . The BGBRLC is compared with the frequentist left-censored regression (FLCR), Bayesian regression for the left censored data (BRLC), Bayesian Lasso regression for the left censored data (BLRLCR) and Bayesian group Lasso regression for left censored data (BGLRLC) . These methods are evaluated based on the median of mean absolute deviations (MMAD) over 1000 simulations. The convergence of the BGBRLC algorithm is checked by trace plots and the histograms of the posterior samples for the regression parameters. The data in the simulations are simulated by { } , where and We generate 50 observations from the above mode, where represents the i th row vector of 8 predictors in the matrix X. The rows of X are simulated independently from N(0, where the (i, j)th element of is 0.95. The true regression coefficients, including the intercept term, are ( ) which are divided into three groups; . The results of MMAD and SD are summarized in Table-1, which shows that the proposed method out-performed the other approaches. We also notice from Table-2 that the proposed approach produces results that are much closer to the true regression coefficients as compared to those produced by the other methods. We summarize The trace plots for the simulation study in Figure-1, which shows that the samples of the BGBRLC method very readily traverse the posterior space very.

Real Data
Here, the proposed approach is illustrated with the data of active sperms. This dataset has 200 observations on 8 variables. The response variable is the count of active sperms, while the other seven variables are covariates, as shown below.
is the count of active sperm, the normal sperm count is 60-150 *1000000. (varicocele) if a person suffers from varicocele, then a value of 0 is given and, if doesn't, the value is 1.
(Smoking) if a person smokes, then a value of 0 is given and, if doesn't, the value is 1. In Table-3, we listed the results of the real data example. To evaluate the methods, the DIC was computed for the five approaches (BGBRLC, BGLRLC, FLCR, FRLCR, BLRLCR ) and the values were 1710.291, 1713.448, 1822.805 , 1817.453 and 1836.229, respectively. The DIC results show that the BGBRLC performs better than the other approaches.

Conclusions
In this paper. We have analyzed the real data using the R software and applied the simulation examples. We compared our proposed method with other methods. The results showed that the new method performs better than some existing methods. The new method can be easily extended to other approaches such as Bayesian group bridge for binary data and Bayesian group bridge for right and interval censored data.