The Use of Parametric and Nonparametric Methods to Study the Effects of Smoking on High-Density Lipoprotein Cholesterol

Analysis of variance (ANOVA) is one of the most widely used methods in statistics to analyze the behavior of one variable compared to another. The data were collected from a sample size of 65 adult males who were nonsmokers, light smokers, or heavy smokers. The aim of this study is to analyze the effects of cigarette smoking on high-density lipoprotein cholesterol (HDL-C) level and determine whether smoking causes a reduction in this level, by using the completely randomized design (CRD) and KruskalWallis method. The results showed that the assumptions of the oneway ANOVA are not satisfied, while, after transforming original data by using log transformation, they are satisfied. From the results, a significantly decreased level of HDL-C in smokers as compared to non-smokers is indicated.


Introduction
Smoking can negatively impact health in many different ways, including the possible impacts on blood cholesterol. Having high cholesterol levels and smoking can be a dangerous combination for the function of the heart [1]. Smoking is now increasing rapidly throughout the developing world [2]. It is correlated to increases in the concentrations of serum total cholesterol and high-density lipoprotein

ISSN: 0067-2904
Ahmed Iraqi Journal of Science, 2021, Vol. 62, No. 4, pp: 1231-1237 1232 cholesterol, which are in turn positively associated with the risk of coronary heart disease [3]. For the above reasons, the analysis of high-density lipoprotein cholesterol requires implementing scientific methods, both parametric and non-parametric. Analysis of variance (ANOVA) is one of the most frequently used statistical methods [4,5]. It allows comparing the mean values of more than two groups in a continuous response variable [6]. It can be thought of as an extension of the t-test for two independent samples to more than two groups [7]. The development of analysis of variance is due to the work of Ronald A. Fisher (1925). Much of the early work in this area dealt with agricultural experiments [4,7].
The valid application of ANOVA depends on three preconditions: independence of samples, normal distribution of error, and homogeneity of variances. Dependence can be eliminated by an appropriate model [5][6][7][8][9].
The appropriate statistical methods for analyzing the data depend on the selected measurement scale and experimental design [10]. Analysis of variance is a robust test against the normality assumption, but it may be inappropriate when the assumption of homogeneity of variance has been violated [6]. The test is a popular rank-based statistical method of analysis and is the nonparametric equivalent of the one-way ANOVA [10].
Transformation of data is another technique used to solve the problems of non-normality [11] and inhomogeneous variances [6]. Several data transformation techniques are available to normalize data from a non-normal form [11]. The most commonly used transformations are square roots, logarithms, and arcsine transformations to reduce heterogeneity and normalize distributions [12]. Singh [2] studied the effects of cigarette smoking on lipid profile among smokers who had smoked for more than 20 years. He found that high-density protein was significantly higher in non-smokers compared to smokers.
The purpose of this study is to analyze the effects of cigarette smoking on high density lipoprotein cholesterol and determine whether smoking causes a reduction in its blood level, by using completely randomized design and method.

Completely Randomized Design
CRD is a parametric method used to compare more than two groups and its mathematical model is [8,13] (1) where is the observation of the treatment, is the population mean, is the treatment effect of the level, and is the random error. The equation (1) can be rewritten as (2) Sum of squares for oneway design can be written as In order to analyze the differences among the group means, the total variability of the observations ( ) is calculated and partitioned into components: treatment sum of squares ( ) and error sum of squares ( ) [14].
where denotes the mean square for treatment and denotes the mean square for error. The F-test is distributed as F-distribution with ( ) , which is the degree of freedom for treatment and ( ) which is the degree of freedom for error term. The completely randomized design has several assumptions that need to be fulfilled, including the normality, homogeneity of variance, and independent of mean and variance.

Assumptions of CRD
In this section, some assumptions of the CRD are given.

Normality
Most of the parametric tests require that the assumption of normality be met. Normality means that the distribution of the test is normally distributed and the assumption of normality is derived under the hypotheses: To test the assumption of normality, the following tests are used: 1. The test is a test for normality that was developed by (1965) [15] and is the most powerful test in most situations [15][16][17]. The statistic for this test is where is the largest order statistic, ̅ is the sample mean, and is the number of observations. If the p-value is over 0.05, we fail to reject the null hypothesis that the sample comes from a normal distribution. 2. The Kolmogorov-Smirnov test is another test for normality, which was first proposed by Kolmogorov (1933) and then developed by Smirnov (1939) [9]. The statistic for this test is | ( ) ( )| (10) where ( ) is the function of the random variable x (expected) and ( ) is the observed frequency of the variable x from the sample. If the resulting D statistic is significant, then the hypotheses that the sample comes from a normally distributed population is rejected.

Homogeneity of Variances
Levene's test was used as a preliminary check of the equal variance (homogeneity of variances) assumption in ANOVA [7,11].
(1960) [7] original article was motivated by the k-sample problem. Before comparing the sample means, one should check that the underlying populations have a common variance. The test hypotheses are (11) The test statistic is where | ̅|, ∑ , ̅ ∑ , ̅ ∑ ∑

Independence of Means and Variances
The independence of means and variances is the other assumption in the assumptions of analysis of variance, and we use a simple correlation coefficient to determine the relationship between the mean and variance. The tested hypotheses are (13) where is the correlation coefficient, the significance of which is tested through the t test. The statistic for this test is If the p-value is over the level of significance, we fail to reject the null hypothesis that the correlation coefficient between the mean and variance is significance.

Kruskal-Wallis Test
The Kruskal-Wallis test is a nonparametric method for testing whether samples originate from the same distribution. Since it is a nonparametric test, it does not assume that the response variable is normally distributed. The hypothesis tests are When there are no ties, the test statistic is given by [4,8] ( ) ∑ ( ) (16) where is the number of samples, is the number of observations in the sample, is the number of observations in all samples combined, and is the sum of the rank in the sample. If there are ties, the test statistic is given by where g is the number of groups with tied values and is the number of observations with tie in group, for a level test, reject if or ( ) .

Practical Part
In this study, the effect of tobacco smoking on HDL-C level is analyzed. The data were collected from a sample of 65 adult males who were nonsmokers, light smokers (one cigarette/day), or heavy smokers (more than one cigarette/day), from the chemical analysis laboratory (Aya Lab) in Chamchamal city, North Iraq. . The data were analyzed with Minitab software v. 17.

Parametric ANOVA Results
Before carrying out any tests, the data must be tested to determine whether these assumptions are satisfied. One of the first steps in using CRD is to test assumptions. To test the assumption of the normality, Shapiro-Willk and Kolmogorov-Smirnov tests were used and the calculations were made according to the aforementioned equations. The following results were obtained.
, with And , with Since the p-value of both Kolmogorov-Smirnov (0.019) and Shapiro -Wilk (0.00) tests is less than the value of the level of significance (0.05), this implies that the null hypothesis in (8) cannot be accepted, and that the data not have normal distribution. Also, to test the assumption of homogeneity of variance, test is used And the result is as follows , with The p-value of test is less than that of the level of significance (0.05), which implies that the null hypothesis in (11) cannot be accepted and there is problem of homogeneity of variances.
To test the assumption of independence of means and variances, simple correlation coefficient is used.
The value resulted from the test is equal to 0.661 with a p-value of 0.54, which is greater than that assigned for the level of significance (0.05), which implies that the null hypothesis in (13) cannot be rejected and that the means and variances are independent. Hence, the two assumptions were tested and both of them were not met.
We transform original data using log transformation to reduce heterogeneity and to normalize distributions. After transformation, the values of tests are: , with And , with After transforming the original data by using log transformation, the p-values of both Kolmogorov-Smirnov test (0.2) and (0.259) are greater than that of the level of significance (0.01), which implies that the null hypothesis in (8) cannot be rejected and that the data are normally distributed.
, with The value of test is equal to (2.45) with a p-value of 0.095. The p-value is greater than that of the level of significance (0.01), which implies that the null hypothesis in (11) cannot be rejected and there is no problem of homogeneity of variances. Hence, the assumptions were tested and met. The parametric analysis of variance is run on this transformed dataset, as given in Table-2 The appropriate test statistic is the least significant difference (LSD): From Table-3, we clearly observe that the mean difference between 1 and 2 is non-significant at 0.05, the mean difference between 2 and 3 is non-significant at 0.05, and the mean difference between 1 and 3 is significant at both 0.01 and 0.05 levels of significance.

Non-Parametric Results
The value of Kruskal-Wallis test is calculated using Minitab17. The Kruskal-Wallis statistic is: , with The p-value is less than the value of the level of significance (0.01), then we conclude that the effect of smoking is significant at the level of significance of 0.01. We can now conduct the multiple comparisons of the pairwise differences. The test statistic is: From a Table-4, it is observed that the differences between 1 and 2 and between 2 and 3 are nonsignificant at the level of significance of 0.01, while the the difference between 1 and 3 is significant.

Conclusions
From the results of the present study, it is concluded that the assumptions of the one-way ANOVA are not satisfied, while these assumptions, after transforming original data by using log transformation, are satisfied. There was a significant decrease in the level of HDL-C in smokers in comparison to that in non-smokers. The mean values of HDL-C level for non-smokers, light smokers, and heavy smokers were 46.679, 45.952, and 32.598, respectively. If the assumptions of the one-way ANOVA F-test are not met, then we can use ANOVA F-test after the transformation of the data and, hence, we can use the non-parametric rank test for the original data.