Classification of Some Test of Normality Techniques into UMP and LMP Using Monte Carlo Simulation Technique

: In Statistics, test of normality is of great importance and cannot be neglected in statistical analysis. However, there exist many techniques for such analysis and researchers usually face with the choice of test. From the literature, it has been established that power of test of normality vary significantly based on sample sizes. In this study, seven normality tests were reviewed and the classification into LMP and UMP were based on Power-of-Test. The test of hypothesis was done at 5% level of significance. The tests considered as; Shapiro-Wilk, Anderson-Darling, Bonett-Serial, Robust Jarque-Bera, Skewness, Lilliefors and Kurtosis tests. The sample sizes considered are 10, 20, 50 and 100 with 1000 replicates. Simulation was done from 3 distributions namely, normal, gamma and beta distributions. It was found that all methods were stronger for the detection of normality when normal distribution was used but the variation in their power was obvious when non-normal distributions were used. Among the methods, only three can be referred to as UMP while the rest are LMP. The UMP methods are Shapiro-Wilk, Anderson-Darling and Lilliefors as their Power-of-Test was not affected by sample sizes.


Introduction
Test of normality is commonly use in data analysis irrespective of the field of study of the researchers especially in the determination of the most appropriate statistical technique for data analysis or hypothesis testing.Statistical methods were derived with some basic assumptions outside which the method becomes invalid [5].For instance, in the case of Chi-Square test of dependency, it is assumed that the cells contain non-zero entries as observed values and more than 20% of the cells contain expected values more than 5.In a similar manner, Z-test and T-test which are often use by researchers for test of significant variation between variables are used with caution as the methods have basic assumptions such as normality of the variables and sample size of the observations [1,25].
Generally, Statistical methods are of two types, namely, parametric and nonparametric methods.For the use of parametric methods, there is need for test of normality for the determination of suitability of the method or test statistic [20].From the literature, there exist more than one hundred methods of test of normality for both univariate and multivariate and the sensitivity of the methods has been shown to depend on some factors such as sample size etc.According to Ukponmwan and Ajibade [35], some test of normality are more sensitive to small sample sizes and some are better used when the sample sizes are large.In the case of multivariate test of normality, sample size of the variables play key role in the choice of method of test of normality as shown in the literature [11,13].
Despite the improvement on the normality methods as presented by researchers from different fields, there is significant or noticeable gap in the sensitivity of the methods which resulted to type I or type II error in many decision taken by the researchers.From the output of test of normality using Ryan-Joiner as presented in Minitab software and the Kolmogorov Smirnov in a set of data, it was observed that the decision of the researcher vary which depends on the method used for the test of normality.This implies there is need for better understanding of a set of data before analysis or hypothesis testing.
The study is aimed at classification of some selected normality methods into Uniformly Most Powerful and Locally Most Powerful.For a robust study, the following objectives were considered; 1) Determination of effect of sample sizes on performance of normality test methods.2) Rating of the methods in terms of Power-of-Test and Type-I-Error.

Review of Related Studies
Stephen [32] confirmed the assumption of multivariate normality (MVN).According to the researcher, for the past 50 years, over 50 tests of this assumption have been proposed.However, for various reasons, practitioners are often reluctant to address the MVN issue.In the research, several techniques for assessing MVN based on well-known tests for univariate normality were described and suggestions are offered for their practical application.The techniques are illustrated using two previously published sets of real-life data.In one of the examples it was shown that simply testing each of the marginal distributions for univariate normality can lead to a mistaken conclusion.
According to Douglas, Bonett and Edith [9], kurtosis can be measured in more than one way.A modification of Geary's measure of kurtosis is shown to be more sensitive to kurtosis in the center of the distribution while Pearson's measure of kurtosis is more sensitive to kurtosis in the tails of the distribution.The modified Geary measure and the Pearson measure are used to define a joint test of kurtosis that has high uniform power across a very wide range of symmetric non normal distributions.
Lobato, Ignacio, and Carlos [24] proposed the skewnesskurtosis test statistic, but studentized by standard error estimators that are consistent under serial dependence of the observations.The standard error estimators are sample versions of the asymptotic quantities that do not incorporate any down weighting, and, hence, no smoothing parameter is needed.Therefore, the main feature of our proposed test is its simplicity, because it does not require the selection of any user-chosen parameter such as a smoothing number or the order of an approximating model.
Thorsten and Herbert [33] confirmed various pattern of testing normality regarding to their power several tests were carried out.Well known test was Jarque and Bera [17][18][19], the test of Kuiper [22] and Shapiro and Wilk [29] as well as tests of Kolmogorov-Smirnov and Cramer Von Mises type.The tests on normality are based, first, on independent random variables (model I) and, second, on the residuals in the classical linear regression (model II).They investigated the exact critical values of the Jarque-Bera test and the Kolmogorov-Smirnov and Cramer Von Mises tests, in the latter case for the original and standardized observations where the unknown parameters µ and σ have to be estimated.The power comparison is carried out via Monte Carlo simulation assuming the model of contaminated normal distributions with varying parameters µ and σ and different proportions of contamination.It turns out that for the Jarque-Bera test the approximation of critical values by the chisquare distribution does not work very well.The test is superior in power to its competitors for symmetric distributions with medium up to long tails and for slightly skewed distributions with long tails.The power of the Jarque-Bera test is poor for distributions with short tails, especially if the shape is bimodal, sometimes the test is even biased.In this case a modification of the Cram´er-von Mises test or the Shapiro-Wilk test may be recommended.
Ralph [28] viewed experimentally-derived data sets that are generated in the practice of clinical chemistry.It was stated that graphical presentation is essential to assess the data distribution.The distribution must also be assessed quantitatively.The approach determines if the data is normal or not.Finally the results of the tests of Normality must be shown to be free of sample size effects.In the work, four experimentally-derived data sets were used.They represented normal.Positive and negatively-skewed distributions were considered.These data sets were examined by graphical techniques, by moment tests, by tests of Normality, and monitored for sample size effects.In the conclusion, it was stated that the preferred graphical techniques are the histogram and the box-and-whisker plots that may be supplemented, with advantage, by quartile quartile or probability -probability plots.Classical tests of skewness and kurtosis can produce conflicting and often confusing results and. as a consequence.the alternative use often newer L-moments is advocated, Normality tests included the Kolmogorov Smirnov (Lilliefors modification), Cramer-von Mises and Anderson-Darling tests (empirical distribution function statistics) and the Gan"" Koehler.Shapiro-Wilk, Shapiro-Francia, and Filliben tests.
Shengyi, Shengyi and Robert [31] used census block groups data on socio-demographics, land use, and travel behavior, to test the cutoffs suggested in the literature for trustworthy estimates and hypothesis testing statistics, and evaluate the efficacy of deleting observations as an approach to improving multivariate normality, in structural equation modeling.The results showed that the measures of univariate and multivariate non-normality will fall into the acceptable range for trustworthy maximum likelihood estimation after a few true outliers are deleted.
Yap and Sim [36] compared the power of eight selected normality tests: the Shapiro-Wilk test, the Kolmogorov-Smirnov test, the Lilliefors test, the Cramer-von Mises test, the Anderson-Darling test, the D'Agostino-Pearson test, the Jarque-Bera test and Chi-squared test.Power comparisons of these eight tests were obtained via the Monte Carlo simulation of sample data generated from alternative distributions that follow symmetric short-tailed, symmetric long-tailed and asymmetric distributions.The simulation LMP Using Monte Carlo Simulation Technique results showed that for symmetric short-tailed distributions, D'Agostino and Shapiro-Wilk tests have better power.For symmetric long-tailed distributions, the power of Jarque-Bera and D'Agostino tests is quite comparable with the Shapiro-Wilk test.As for asymmetric distributions, the Shapiro-Wilk test is the most powerful test followed by the Anderson-Darling test.
Adefisoye, Golam and George [2] examined problems of testing for normality in both theoretical and empirical statistical research.The performances of eighteen normality tests were all listed.Monte Carlo simulation was used from various symmetric and asymmetric distributions for different sample sizes ranging from 10 to 100.The performance of the test statistics were compared based on empirical Type I error rate and power of the test.The simulation results showed that the Kurtosis Test is the most powerful for symmetric data and Shapiro Wilk test is the most powerful for asymmetric data.
According to Felix and Senyo [12], most parametric methods rely on the assumption of normality.Results obtained from these methods are more powerful compared to their non-parametric counterparts.However for valid inference, the assumptions underlying the use of these methods should be satisfied.According to the researchers, many published statistical articles that make use of the assumption of normality fail to guarantee it.Hence, quite a number of published statistical results are presented with errors.As a way to reduce this, in assessing univariate and multivariate normality, several methods have been proposed.In the univariate setting, the Q-Q plot, histogram, box plot, stem-and-leaf plot or dot plot are some graphical methods that can be used.Also, the properties of the normal distribution provide an alternative approach to assess normality.The Kolmogorov-Smirnov (K-S) test, This provides a starting point for assessing normality in the multivariate setting.A scatter plot for each pair of variables together with a Gamma plot (Chi-squared Q-Q plot) is used in assessing bivariate normality.
Ukponmwan and Ajibade [35] explained the sensitivity of nine normality test statistics; W/S, Jaque-Bera, Adjusted Jaque-Bera, D'Agostino, Shapiro-Wilk, Shapiro-Francia, Ryan-Joiner, Lilliefors'and Anderson Darlings test statistics, with a view to determining the effectiveness of the techniques to accurately determine whether a set of data is from normal distribution or not.Simulated data of sizes 5, 10, …, 100 is used for the study and each test is repeated 100 times for increased reliability.Data from normal distributions (N (2, 1) and N (0, 1)) and non-normal distributions (asymmetric and symmetric distributions: Weibull, Chi-Square, Cauchy and t-distributions) are simulated and tested for normality using the nine normality test statistics.To ensure uniformity of results, the researchers used one software in all the data computations to eliminate variations due to statistical software.The error rate of each of the test statistic is computed; the error rate for the normal distribution is the type I error and that for non-normal distribution is type II error.Power of test is computed for the non-normal distributions and use to determine the strength of the methods.The ranking of the nine normality test statistics in order of superiority for small sample sizes is; Adjusted Jarque-Bera, Lilliefor's, D'Agostino, Ryan-Joiner, Shapiro-Francia, Shapiro-Wilk, W/S, Jarque-Bera and Anderson-Darling test statistics while for large sample sizes, we have; D'Agostino, Ryan-Joiner, Shapiro-Francia, Jarque-Bera, Anderson-Darling, Lilliefor's, Adjusted Jarque-Bera, Shapiro-Wilk and W/S test statistics.Hence, only D'Agostino test statistic is classified as Uniformly Most Powerful since it is effective for both small and large sample sizes.Other methods are Locally Most Powerful.Shapiro-Francia, an improvement of Shapiro-Wilk is more sensitive for both small and large samples, hence should replace Shapiro-Wilk while the Adjusted Jarque-Bera and the Jarque-Bera should both be retained for small and large samples respectively.
Tanweer [34] evaluates the performance of some selected normality tests for the skewed alternative space.According to the researcher, stringency concept allows rank of 12 tests uniquely.Among the methods considered, Bonett and Seier test (Tw) turns out to be the best statistics for slightly skewed 13 alternatives and the Anderson-Darling (AD), Chen-Shapiro (CS), Shapiro-Wilk (W) and Bispo, 14 Marques, &Pestana, (BCMR) statistics were the best choices for moderately skewed alternative 15 distributions.Maximum loss of Jarque-Bera (JB) and its robust form (RJB), in terms of deviations 16 from the power envelope, is greater than 50% even for large sample sizes which makes them less 17 attractive in testing the hypothesis of normality against the moderately skewed alternatives.On 18 balance, all selected normality tests except Tw and COIN performed exceptionally well against the 19 highly skewed alternative space.
According to Agnieszka [3], statistical inference in the form of hypothesis tests and confidence intervals often assumes that the underlying distribution is normal.Similarly, many signal processing techniques rely on the assumption that a stationary time series is normal.As a result, a number of tests have been proposed in the literature for detecting departures from normality.In this research Robert developed a novel approach to the problem of testing normality by constructing a statistical test based on the Edgeworth expansion, which approximates a probability distribution in terms of its cumulates.By modifying one term of the expansion, a define test statistic which includes information on the first four moments.performs a comparison of the proposed test with existing tests for normality by analyzing different platykurtic and leptokurtic distributions including generalized Gaussian, mixed Gaussian, α-stable and Student's t distributions.Robert showed that some proposed test is superior in terms of power for the platykurtic distributions whereas for the leptokurtic ones it is close to the best tests like those of D'Agostino-Pearson, Jarque-Bera and Shapiro-Wilk.
Bruno and Norbet [6] confirmed the developments in affine invariant tests for multivariate normality with special emphasis on asymptotic properties of several classes of weighted L 2 -statistics.According to the researchers, weighted L 2 -statistics typically have limit normal distributions under fixed alternatives to normality, they open ground for a neighborhood of model validation for normality.The paper also reviews several other invariant tests for this problem, notably the energy test, and it presents the results of a large-scale simulation study.All tests under study are implemented in the accompanying R-package.
Nasrin [26] showed that different univariate normality testing procedures are compared by using new algorithm.Different univariate and multivariate test are also analyzed and also review efficient algorithm for calculating the size corrected power of the test which can be used to compare the efficiency of the test.In the research, 100 data sets with combinations of sample sizes; n = 10, 20, 25, 30, 40, 50, 100, were generated from uniform distribution and tested by using different tests for randomness.The assessment of normality using statistical tests is sensitive to the sample size.It was observed that with the increase of sample size, overall powers are increased but Shapiro Wilk (SW) test is the most powerful test among other tests.

Methodology
Every research study has the main aim of finding solution to some define problems and in an attempt to solve the problems, various techniques and strategies are employed.The research process usually involves identifying the problems, making Hypothetical statements about the pleasured relationship among the variables, collecting and analyzing such data using the appropriate statistical tools.This procedure is known as the research mythology.In this section, these processes mentioned above will be treated and also the statistical tools to be employed in the analysis of the data highlighted.

Statistical Methods
The following univariate normality tests were considered in the research;

Anderson-Darling Test [AD]
The AD test is of the form: Where f n [x] is the empirical distribution function, ф[x] is the cumulative distribution function of the standard normal distribution and ᴪ[x] is a weight function [4].

D'Agostino-Pearson K 2 test [DK]
The test combines g 1 and g 2 to produce an omnibus test of normality.The test statistics is: s and Z 2 k are the normal approximations to sand k respectively [7].The test statistic follows approximately a chi-square distribution with 2 degree of freedom when a population is normally distributed.The test is appropriate for a sample size of at least twenty.

Shapiro-Wilk Test [sw]
The Shapiro-wilk test uses a w-static [14,30] which is defined as represents the i th order statistic of the sample.
The constants are given by ⁄ and m is given by m = (m 1 , m 2 , …, m n ) where m 1 , m 2 ,…m n are the expected values of the order statistics of independent and identically distributed random variables sampled from the standard normal distribution, and V is the covariance matrix of those order statistics.

Bonett-Serial Test [BS]
The test statistic T w is given by: in which % is set by This statistic follows a standard normal distribution under null hypothesis.

Robust Jarque-Bera Test [RJB]
The robust Jarque-Bera [RJB test statistics define as and M is the sample median.The RJB statistic is asymptotically 7 -distributed.

Lilliefor's Test (LL)
The test statistics is defined as; where 9 is the sample cumulative distribution function and = * is the cumulative distribution function (CDF) of the null distribution [23].

Cramer-Von Mises Test (CVM)
In statistics the cramer-von mises criterion is a criterion used for judging the goodness of fit of a cumulative distribution function = * compared to a given empirical LMP Using Monte Carlo Simulation Technique distribution function = , or for comparing two empirical distributions.It is also used as a part of other algorithms, such as minimum distance estimation.It is defined as;

Skewness Test [SK]
The skewness statistic is defined as Here, ̅ is the sample mean.

Kurtosis Test [KU]
The Kurtosis statistic is defined as: Here, ̅ is sample mean.The " minus 3" at the end of this formula is often explained as a correction to make the kurtosis of the normal distribution equal to zero, as the kurtosis is 3 for a normal distribution.

Data Presentation
Data were stimulated from normal and non normal distributions with varying sample sizes (10, 20, 50 and 100).A replicate of 1000 was used at every point.

Grouping of Normality Techniques into UMP and LMP
Normality test statistic that is capable of detecting normality for both large and small sample sizes can be referred to as UMP.Test statistic that can only be used for either small or large sample size can be referred to as LMP [10].This implies any normality test technique classified as UMP can be used for test of normality, irrespective of sample size of the data and the LMP tests must be used with caution.

Algorithm for Monte Carlo Simulation
According to Ukponmwan and Ajibade [35], Monte Carlo (MC) simulation can be used for the performance of an estimator.Monte Carlo procedure can be stated thus; 1. Specify the data simulation procedure.
2. Select desirable sample size n.
3. Decide the desirable number of replicates.4. Generate a random sample of size n based on the Data. 5. Using random sample generated, calculate the statistic(s).
7. Examine the results for type I or type II error.

Determination of Sensitivity of Normality Test
In this paper, P-value was used for the determination of sensitivity of normality testing techniques considered.The decision rule can be stated as; reject the null hypothesis if the p-value is less than the level of significance.In this study, a p-value greater than 0.05 implies the set of data is normally distributed and the higher the value.
According to Nornadiah and Yap [27], power of a statistical test is the probability that the test will reject the null hypothesis, when the alternative hypothesis is true (i.e. the probability of not committing a Type II error).The power is in general a function of the possible distributions, often determined by a parameter, under the alternative hypothesis.As the power of test increases, the chance of a Type II error occurring decreases [16,21].The probability of a Type II error occurring is referred to as the false negative rate (β).Therefore power of test is equal to 1 − β, which is also known as the sensitivity [8,15].

Analysis
In this section, simulation was carried out using symmetric and asymmetric distributions to capture possible distributions in existence in Statistics.Using the selected techniques, the data analysis was done using the steps below; i  Table 10 shows strength of univariate normality tests considered with respect to some selected distributions.From the table, it can be observed that power of univariate test statistic is significantly affected by nature of the data (distribution).At this point, it is adequate to conclude that sample size and shape of data play vital role in the choice of normality test.From figures 1 to 9, all the univariate normality have significant power of test as the values are greater than 0.5 but less than 1.0.This implies researchers that formulated them have done a greater work for the formulation as all the methods show positive response in terms of application but the strength varies.

Conclusion
Test of normality plays a vital role in application of Statistics to data set as it prevents statistical error both type I and II errors.From the comparison, it is crystal clear that every test statistic for the determination of normality of a data set is adequate but the sensitivity vary significantly when sample size is taken into consideration.It was observed that shape of the data also affects the sensitivity of the technique.Care must be taken in considering any of the methods are uniformly most powerful but they are all locally most powerful.

Recommendation
Univariate normality tests must be used with great caution so as to prevent statistical error in the research.
Since there are other methods for the test of normality, it is therefore right to recommend the comparison of some other methods for their sensitivity.
Multivariate normality tests can also be compared to ease the strength of one by one test of normality in a large data set.

Figure 1 .
Figure 1.Power of test of Shapiro-Wilk test.

Figure 2 .
Figure 2. Power of Test of Anderson Darling test.

Figure 3 .
Figure 3. Power of test of Bonett-Serial test.

Figure 5 .
Figure 5. Power of test of D'Agostino-Person test.

Figure 7 .
Figure 7. Power of test of Kurtosis test.

Figure 9 .
Figure 9. Power of test of Cramer-von Mises test.
Lilliefors corrected K-S test, Shapiro-Wilk test, Anderson-Darling test, Cramer-von Mises test, D'Agostinoskewness test, Anscombe-Glynn kurtosis test, D'Agostino-Pearson omnibus test, and the Jarque-Bera test are also used to test for normality.However, Kolmogorov-Smirnov (K-S) test, Shapiro-Wilk test, Anderson-Darling test, and Cramer-von Mises test are widely used in practice and implemented in many statistical applications.For multivariate normal data, marginal distribution and linear combinations should also be normal.

Table 1 .
Error Rate of SW.

Table 2 .
Error Rate of Anderson Darling.

Table 3 .
Error Rate of Bonett-Serial Test.

Table 4 .
Error Rate of Robust Jarque-Bera Test.

Table 5 .
Error Rate of D'Agostino-Person Test.

Table 6 .
Error Rate of Skewness Test.

Table 7 .
Error Rate of Kurtosis Test.

Table 9 .
Error Rate of Cramer-von Mises (CVM) Test.From the tables presented above, it can be observed that no normality test statistic has 100 percent accuracy with respect to LMP Using Monte Carlo Simulation Technique samples sizes considered as well as the distributions considered.This is a signal that univariate normality test should be used with utmost caution to prevent types of error in hypotheses testing.

Table 10 .
Power-of-Test of Selected Normality Test Statistics.