Distribution Effect on the Efficiency of Some Classes of Population Variance Estimators Using Information of an Auxiliary Variable Under Simple Random Sampling

In many sampling situations, researchers come across variety of data. These data are largely affected by the parent distribution. There are characteristics which some data share based on the parent distribution. These characteristics define their distribution as well as their behavior. The use of auxiliary variable in estimating a study variable has been on the increase. Auxiliary variable has been used in estimating population means as well as variances. The variance is very sensitive to distribution. Thus, estimating the variance using auxiliary variable might lead to some unexpected results. Hence the need to check the effect of the distribution of the performances of some selected classes of variance estimators. Twelve estimators were selected for comparison. Eight distributions were considered using simulation study. The selected distributions are: Normal, Chi-square, Uniform, Gamma, Exponential, Poisson, Geometric and Binomial. A population size of 330 was used while sample size of 30 was considered using simple random sample without replacement. The estimators were compared using Bias, and Mean Square Error. The performances of the estimators vary in some distributions. The gamma and exponential distributions showed wide variability. The performances of the estimators based on Bias is the same as that based on Mean Square Error. The Mean Square Errors were ranked. The best estimator is t1 followed be t10 and t12. The results showed that the estimators are not distribution free.


Introduction
Sampling is the systematic process of selecting a representative part of a population for study so that inferences could be made about the entire population. Among the advantages of sampling are that it can save cost and human resources. One of the disadvantages is that sampling process only enables a researcher to make estimation about the actual situation instead of finding the real truth. The objective of selecting a sample is to achieve maximum accuracy in one's estimation within a given sample size and to avoid bias. This is important as bias can attack the integrity of facts and jeopardize one's research outcome.
Bias is the tendency of a statistic to systematically overestimate or under-estimate a population parameter. It is the difference between the expected value of the estimator and the true value of the parameter being estimated. Bias can occur due to unrepresentativeness of the sample (section bias), poor measurement process (response bias), and so on. Increasing sample size tends to reduce sampling error which is the variability among statistics from different samples drawn from the same population. However, increasing the sample size does not affect survey bias; a large sample cannot correct the methodological problems that produce survey bias.
Another concept used in assessing the performance of an estimator is the Mean Squared Error (MSE). Mean Squared Error is quite important for relaying the concept of precision, bias and accuracy during statistical estimation. Mean Squared Error (MSE) is defined as the average of squares of errors; error is the difference between the attribute which is to be estimated and the estimator. It can be referred to as the 28 Etaga Harrison Oghenekevwe et al.: Distribution Effect on the Efficiency of Some Classes of Population Variance Estimators Using Information of an Auxiliary Variable Under Simple Random Sampling second moment of the error measured about the origin. It incorporates both variance and bias of the estimator. If an estimator is unbiased, then its MSE is the same as it variance. The estimation of the population mean is a consistent issue in sampling practice and efforts have been made to improve the precision of the estimates. The literature on survey sampling describes a great variety of techniques for using auxiliary information by means of ratio, product and regression methods. Particularly in the presence of multiauxiliary variables, a wide variety of estimators have been proposed, following different ideas, and linking together ratio, product or regression estimators each one exploiting the variables one at a time. [31] was the first author to deal with the problem of estimating the mean of a survey variable when auxiliary variables are made available. He suggested the use of information on more than one supplementary characteristics, positively correlated with the study variable, considering a linear combination of ratio estimators based on each auxiliary variable separately. The coefficient of the linear combination was determined so as to minimize the variance of the estimator. Analogously to [38] gave a multivariate expression of [28] product estimator, while [34] suggested a method for using auxiliary variables through a linear combination of single difference estimators. [35] proposed new ratio estimator using two auxiliary variables in simple random sampling obtaining the Mean Square Error (MSE) equation of the proposed estimator and proposed estimator is more efficient than the traditional regression estimators suggested by [15]. [54] proposed a modified ratio type variance estimator for estimation of population variance of the study variable when the population median of the auxiliary variable is known, obtaining the Bias and the Mean Squared Error of the estimator and also derived the conditions for which the proposed estimator performs better than the traditional ratio type variance estimator suggested by [15]. [9] did a comparison of [54] estimators and obtain the best estimator. [22] proposed an improved estimator through well-known kappa technique using [58]. The large sample properties of the estimator were studied up to the first order of approximation. In their works the optimum value of the characterizing scalar kappa were obtained. They did comparison of their estimator with the existing estimators of population variance using secondary data and the results showed that the overall exiting mentioned estimators has lesser MSE as compared to other estimators. [17] proposed an estimator for the population variance using an auxiliary variable in simple random sampling obtaining the Mean Square Errors (MSE) equation of the proposed estimator and showing that the proposed estimator was more efficient than the traditional ratio type and regression estimator suggested by [14].
[16] suggested a generalized ratio type estimator for population variance of study variable utilizing information obtained from two auxiliary variables. They compared the new estimator efficiency with the generalized ratio product type estimator based on information from auxiliary variable under simple random sampling without replacement. Empirically, the estimator performed more efficiently compared to the usual unbiased estimator and its existing biased variance estimators under their derived conditions and for scalars and constants variable choice at which their Bias were also smaller in comparison. [38,41] introduce some new estimators. [36] worked on a class of product-cum-dual to product estimators of the population mean in survey sampling using auxiliary information. Their study described huge variety of techniques for auxiliary information use to obtain improved estimators for estimating some of the most common population parameters such as population mean, proportion, ratio, etc. Their study provided a unified treatment towards the properties of different estimators comprising of Efficiency, Comparisons, and Comparisons of the proposed estimator with sample mean per unit estimator. Then they came up with the conclusion that the proposed class of estimators were more efficient than conventional estimator and estimators given by [53] under the effective ranges of α along with its optimum values.
[47] addressed the problem of estimating finite population variance using auxiliary information in simple random sampling. A ratio-cum-difference type class of estimator for population variance was suggested with its properties under large sample approximation. It was showed that the suggested class of estimator was more efficient than usual unbiased, difference, [6,13,50,18], and other estimators/classes of estimator. [37] introduced an exponential ratio-type estimator for the population variance and compared its Mean Square Error (MSE) with MSE of some of the existing estimators, then came up with correct MSE expression, up to the first order of approximation of same estimator whereby had comparison by taking the corrected expression of Mean Square Error and having also the corresponding exponential ratio type estimator for the population variance under double sampling technique.
[40] also made known new classes of estimators in estimating the finite population mean under double sampling in the presence of non-response when using information on fractional raw moments having the MSE derived up to the first degree of approximation. Showing that a proposed class of estimators performs better than the usual mean estimator, ratio type estimators, and [42] estimator. [45] suggested some new estimators for the ratio and product of two means by utilizing information on auxiliary variable for the situations where measurement errors were present. They analyzed the influence of measurement errors on the biases of their proposed estimators and found that the biases of some of the suggested estimators were affected by measurement errors.
The problem of finding unbiased ratio estimators of the population total of some character with the help of an auxiliary character has drawn much attention in recent years [38]. Under commonly adopted sampling schemes [12,10,25] and others derived certain unbiased ratio type estimators Science Journal of Applied Mathematics and Statistics 2020; 8(1): 27-34 29 of the population total, while [23,26,7,29] gave modifications of certain sampling schemes under which their ratio estimators type were unbiased. The former authors group was primarily concerned with getting new unbiased ratio estimators type under common sampling schemes. While the later group was concerned with introducing small modifications in certain modified sampling schemes enabling the usual ratio estimates of these sampling schemes become unbiased under the modified sampling schemes together with their extensions were applied.
[29] gave a general procedure of some unbiased estimation parameters type which include population total and variance etc and showed how a given sampling scheme can be modified to make ratio estimators of such parameters unbiased. The modification could basically be applied to commonly used sampling schemes been met in practice and also shown that if in these modified sampling schemes a sufficient statistic is available [as in those of with replacement sampling schemes, reference [1] and [33] and if the ratio estimator does not depend on the sufficient statistic, then it could uniformly improve by Rao-Blackwell theorem.
[43] worked on some estimators for estimating the population variance using an auxiliary attribute suggesting a generalized class of estimators based on the adaption of the estimator for the population variance using information on an auxiliary attribute in simple random sampling. The properties of the suggested class of estimators were derived and asymptotic optimum estimator identified with its properties. The large numbers of known estimators were members of the suggested generalized class and it was shown that proposed generalized class of estimators were more efficient than usual unbiased estimator, ratio, exponential ratio and regression estimator, estimator due to [46]. [20] proposed a modified ratio type estimator for finite population variance using the variables transformation where by his proposed estimator performed better the suggested existing estimators. [48] proffered new estimators class for estimating population variance in auxiliary attribute presence, obtained their Bias and Mean Squared Error (MSE) then showed that the proffered estimator was more efficient compared to the estimated estimators. [54] proposed estimator using real data to checkmate its (estimator) performance and had that the Bias and Mean Squared Error (MSE) of their proposed estimator performed better than the Biases and Mean Squared Error of traditional and existing estimators.
[8] compared thirty eight ratio estimators using bias and MSE. They simulated from six distributions, Hypergeometric, Normal, Exponential, Uniform, Binomial and Chi-square using sample sizes of 17, 25, 39, 50 and 100. They concluded that some estimators are affected by sample sizes whereas the best estimator, estimator twenty-four, was distribution free.
[18] suggested a modified ratio type variance estimators using known coefficient of variation and coefficient of kurtosis of an auxiliary variable values.
[40] improved [2] exponential ratio type estimator for the population mean and proposed an exponential ratio type estimator for the population variance estimation using linear combination of tri-mean and quartiles.
[3] proposed a class of modified ratio type variance estimator for population variance of the study variable, obtained their Bias and Mean Squared Error (MSE) and had that the proposed estimator performed better than the existing estimators.
[27] proposed a modified ratio type estimator of population variance using the study variable under simple random sampling without replacement by making use of the coefficient of Kurtosis and median of an auxiliary variable and obtained that the proposed estimator performed better than the existing estimators.
[21] proposed a problem of population mean using random non-response for estimation improvement and had it performed efficient both in mean squared error terms and survey practices than its estimated counterpart.
[54] also proposed a modified ratio type variance estimator for estimating the study variable having known the population median and Coefficient of Variation of an auxiliary variable in population variance. Derived the Bias and Mean Squared Error (MSE) of the proposed estimator and had that the proposed estimator performed than the traditional ratio type estimator variance estimator and existing ratio type variance estimator.
The use of auxiliary variable in survey sampling has old history. The auxiliary variable has been efficiently used by numerous survey statisticians to increase the precision of estimates. [24,9,11,39,43,49,51,55,53,44,57]. The auxiliary information also plays very sufficient role of estimating techniques in survey sampling. [4] also introduced the regression as more efficient method of estimation in survey sampling. He also introduced the regression estimator as more efficient method of estimation in survey sampling. [52] had shown the efficient use of auxiliary information in improving the performance of estimator. The use of multiple auxiliary variables was discussed using the information on simple and multiple auxiliary variables. The estimation of the population mean is a persistent issue in sampling practice and many efforts have been made to improve the precision of the estimates. The literature on survey sampling describes a great variety of techniques for using auxiliary information by means of ratio product and regression methods. Particularly, in the presence of multi-auxiliary variables, a wide variety of estimators have been discussed, following different ideas, and linking together ratio product or regression estimators, each one exploiting the variable one at a time.

Materials and Methods
The values of the estimators will be computed using the estimators presented in Table 1. Data were simulated from eight distributions: Normal, Chi-square, Uniform, Gamma, Exponential, Poisson, Geometric and Binomial. The Bias of the estimators are as given in Table 1 Table 2 - Table 4. The ranks of the estimators are presented in Table 5.  = Covariance between Y and X Defining parameters as follows;

Conclusion
Using the Mean Square Error (MSE) as a basis for the comparison, the estimators where ranked and it could be observed that 1. The overall best estimator irrespective of distribution is estimator T 1 , followed by T 10 and T 12 for second and third respectively. 2. The estimators are affected by the distributions. This implies that when using the estimator, researchers should first check their data set for the distribution.