Optimal Allocation in Small Area Mean Estimation Using Stratified Sampling in the Presence of Non-Response

: Sample survey provides reliable current statistics for large areas or sub-population (domains) with large sample sizes. There is a growing demand for reliable small area statistics, however, the sample sizes are too small to provide direct (or area specific) estimators with acceptable and reliable accuracy. This study gives theoretical description of the estimation of small area mean by use of stratified sampling with a linear cost function in the presence of non-response. The estimation of small area mean is proposed using auxiliary information in which the study and auxiliary variable suffers from non-response during sampling. Optimal sample sizes have been obtained by minimizing the cost of survey for specific precision within a given cost using lagrangian function multiplier lambda and Partial Differential Equations (PDEs). Results demonstrate that as the values of the respondent sample increases sample units that supply information to study and auxiliary variable tends to small area population size, the non-response sample unit tends to sample units that supply the information as the sampling rate tends to one. From theoretic analysis it is practical that the Mean Square Error will decrease as the sub-sampling fraction and auxiliary characters increase. As the sub-sampling fraction increases and the value of beta increases then the value of large sample size is minimized with a reduction of Lagrangian multiplier value which minimizes the cost function.


Small Area
Small Area refers to a population for which reliable statistics of interest could not be computed using standard methods because of small or even zero sample sizes in the area. Some of the perceived small areas include geographical regions such as county, sub-county and wards, and demographic regions such as age, sex and race. In sampling the units are divided into two strata for homogeneity, the first strata represent respondents while the second strata represent non-respondents.

Small Area Estimation
According to Rahman [13] small area estimation has received much attention in recent decades due to increasing demand for reliable small area estimates for both public and private sectors. Sample data on small areas is inadequate to provide statistical estimates with high precision. This therefore makes it necessary to borrow strength from data on related auxiliary variables using appropriate models.
Small area estimation is therefore any statistical technique that involves the estimation of parameters for small subpopulations. Methods used in small area estimation are categorized as design based and model based. According to Rahman [13] design-based method reference was made for particular sampling design used whereas model-based method involves statistical method based on Bayesian approaches.
Among the models used in small area estimation and prediction is Linear Mixed Model that has found wide range of applications particularly for its ability to predict linear combination of fixed and random effects. Henderson [10] Ongoma zero. The Best Linear Unbiased Estimator (BLUP) method was widely used especially in fitting models for the genetic trends in animal population based on different traits measured both on continuous and categorical scale. Henderson [10] assumed that the variances associated with random effect in mixed model were known but in practice that was not the case. Such variance components are unknown and have to be estimated from the sampled data. Several researchers have proposed methods of estimating variance, among them was Harville [9] who reviewed methods suggested by Henderson of Maximum Likelihood and residual maximum likelihood. In his proposition he assumed normality which was not the case in all estimations. Therefore, in this study a design-based model is developed to solve non-linearity of the cost estimation.
Fay and Herriot model have received much attention in the previous years. Abhishek [1] applied it in estimating small area indicators. The model used was of the form; is a vector of known covariates is a vector of unknown regression coefficients being the specific random effect e represents the sampling error Generally, the Fay and Herriot model assumed linearity in estimation of parameters thus making it difficult to estimate costs when traveling costs was considered as a component of survey cost. Wanjoya et'al [15] carried out a study on small area estimation by incorporating a turning (index) parameter into the standard area-level (Fay-Herriot) model. In his model it was realized that the proposed model was a good alternative to the standard Fay-Herriot model though it did not consider a case of non-response. Different designs and models have been adopted in small area estimation. In this study, stratified sampling is considered in the presence of non-response during sampling and a linear cost function to cater for travel costs that are incurred during sampling. The main objective of this study is to develop linear cost model considering stratified sampling design in the presence of nonresponse and compute reliable estimates for a given small area.
Arnold et'al [4] [11], model-based estimator uses prediction approach in which the depended variable Y is predicted. The model-based estimates are only model unbiased within the structure of that specific model. It was realized that the model provides precise parameter estimates and explicit model specification. Aditya et'al [2] developed a method of estimating domain total for unknown domain size in the presence of non-response with linear cost function using two stage sampling design. The assumption was that the response mechanism was deterministic. Expression of the variance of the estimator and a suitable cost function for obtaining optimum sample size was developed. Empirical results showed that the percentage reduction in the expected cost decreased with a decrease in unit travel.

Optimal Allocation
Saini [14] proposed a method of optimum allocation for stratified two stage sampling design for multivariate surveys. The total cost of the survey was expressed as In his method the problem of determining optimum allocations was formulated as a non-linear programming problem (NLPP). The langragian multiplier technique was used to solve the formulated NLPPs.
Cherniyak [7] proposed optimum allocation in double sampling with stratification using non-linear cost function. The proposed non-linear cost function given was of the form; In his findings, it was noted that the Mean Squared Error (MSE) increased as the second sample size decreased for all values computed using linear and non-linear cost function. Also, MSE of the estimator computed using linear and nonlinear cost function increased with an increase in the inverse sampling and non-response rates. Therefore, it was noted that an increase in use of auxiliary information reduced nonresponse error thus increasing the MSE.

Estimation of Population Parameters in the Presence of Non-Response
Hansen and Hurwitz [8] suggested a technique for handling the non-response in mail surveys. Mail survey is advantageous over the other survey since it is inexpensive. Okafor [12] extended Hansen and Hurwitz problem to the estimation of the population total in element sampling on two successive occasions. Later, Chaundhary and Kumar [6] used the Hansen and Hurwitz techniques to estimate the population product for sampling on two occasions when there was non-response on both occasions. Cochran [5] extended Hansen and Hurwitz technique for the case when the information on the characteristic under study was also available on auxiliary characteristics. Chaundhary and Kumar [6] proposed a method of estimating the mean of finite population using double sampling scheme under non-response. The proposed model was based on the fact that both the study and auxiliary variables suffered from non-response with the information of auxiliary variable X not available. The estimate of x at the first phase was given by; With the corresponding variance of Where;

Proposed Small Area Concept in the Presence of Non-Response
Let U be a finite population with known population size N. The population is divided into small area groups defined by U 1 , U 2 ,..., Us with group sizes N 1 , N 2 , …, N s respectively. and E The assumption is that

Bias of the Ratio Estimator
Where,

Optimal Allocation
An optimum size of a sample is required so as to balance the precision and cost involved in the survey. The optimum allocation of a sample size is attained either by minimizing the precision against a given cost or minimizing cost against a given precision. In this study, a linear cost function has been considered.
Denote the cost function for the ratio estimation by ( ) The optimal values of n and k are given by To obtain the normal equations, the expression of Equation (22)