Optimal Allocation in Domains Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non-Response

Studies have been carried out on domain mean estimation using non-linear cost function. However little has been done on domain stratum estimation using non-linear cost function using ratio estimation in the presence of non-response. This study develops a method of optimal stratum sample size allocation in domain mean estimation using double sampling with non-linear cost function in the presence of nonresponse. To obtain an optimum sample size, Lagrangian multiplier technique is employed by minimizing precision at a specified cost. In the estimation of the domain mean, auxiliary variable information in which the study and auxiliary variables both suffers from non-response in the second phase sampling is used. The expressions of the biases and mean square errors of proposed estimator has also been obtained.


Domains
In sampling, estimates are made in each of the class into which the population is subdivided. Such subgroups or classes are known as the domain of study. Units of domains may sometimes be identified prior to sampling. Such domains are called planned domains. For unplanned domain the units cannot be identified prior to sampling and hence the estimates of certain domains is often evident only after the sampling design has been identified or after the Sampling and field work have been completed. Hence the size of unplanned domain cannot be controlled. The sample sizes for sub-populations are random variables since formation of these sub-populations is unrelated to sampling.
According to Eurostat [4] the precision threshold and or minimum effective sample sizes are set up for effective planned domains. The minimum sample sizes required to achieve a relative margin error of 100.k% for the total d Y  S is unknowns and have to be estimated using data from auxiliary sources.

Optimal Allocation with Non-Linear Cost Function
Optimal sample allocation involves determining the sample size 1 2 , ,..., H n n n that minimizes the various cost characters under a given sampling budget C (where C is the upper limit of the total cost of the survey). Linear cost function is appropriate when the cost involved is associated with non-travel activities of survey e.g drawing sample, preparing survey methods, locating, identifying, interviewing respondents and coding data. Such a linear cost function can be of the form, Generally the above linear cost function is mostly applicable when the major cost item is that of taking the measurements on each unit without considering the cost of the distance between the sample units.
Chernyak [1] proposed optimum allocation in double sampling for stratification with a non-linear cost function. The proposed non-linear cost function is of the form, Okafor and Lee [9] employed the double sampling method to estimate the mean of the auxiliary variable and proceeded to estimate the mean of the study variable in a similar way as Cochran [3]. In this method double sampling ratio and regression estimation was considered. The distribution of the auxiliary information was not known and hence the first phase sample was used to estimate the population distribution of the auxiliary variable while the second phase was used to obtain the required information on the variable of the interest. The optimum sampling fraction was derived for the estimators at a fixed cost. Performance of the proposed estimators was compared with those of Hansen and Hurwitz [5] estimators without considering the cost. It was noted that for the results for which cost component was not considered, regression estimator functions were consistent than the Hansen and Hurwitz [5] estimator. Tschuprow [11] and Neyman [8] proposed the allocation procedure that minimizes variance of sample mean under a linear cost function of sample size Neyman [8] used Lagrange multiplier optimization technique to get optimum sample sizes for a single variable under study. Holmberg [6] addressed the problem of compromised allocation in multivariate Stratified sampling by taking into consideration minimization of some of the variances or coefficient of variation of the population parameters and of some of the efficiency losses which may be as a result of increase in the variance due to the use of compromise allocation. Saini [10] developed a method of optimum allocation for multivariate stratified two stage sampling design by using double sampling. In this method the problem of determining optimum allocations was formulated as non-linear programming problem (NLPP) in which each NLPP has a convex objective function under a single linear constraints. The Lagrange multiplier technique was used to solve the formulated NLPPs. Khan et al. [7] proposed a quadratic cost function for allocating sample size in multivariate stratified random sampling in the presence of non-response in which a separate linear regression estimator is used. In this multi-objective Non-linear integer programming problem, an extended lexicographic goal programming was used for solution purpose and comparison made with individual optimum techniques. It is observed that in the allocation techniques, the extended lexicographic goal programming gives minimum values of coefficient of variation than the individual optimum and goal programming technique. Choudhry [2] considered sample allocation issues in the context of estimating Sub-populations (stratum and domain) means as well as the aggregate population means under stratified simple random sampling. In this method non-linear programming was used to obtain the optimal sample allocation to the strata that minimizes the total sample sizes subject to a specified tolerance on the coefficient of variation of the estimators of strata and population means.
From the previous studies, a number of researchers have considered a linear cost function when estimating domains. In dealing with non-response most of them have considered subsampling while holding to the idea that the response mechanism is deterministic. This paper therefore focuses on the estimation of domain mean using double sampling for ratio estimation with non-linear cost function with a random response mechanism. In this study we therefore establish an efficient and cost effective method of estimating domains when the travel component is inclusive and it is not linear.

Introduction
The problem of non-response is inherent in many surveys. It always persists even after call-backs. The estimates obtained from incomplete data will be biased especially when the respondents are different from the non-respondents. The nonresponse error is not so important if the characteristics of the non-responding units are similar to those of the responding units. However, such similarity of characteristics between two types of units (responding and non-responding) is not always attainable in practice. In double sampling when the problem of non-response is present, the strata are virtually divided into two disjoint and exhaustive groups of respondents and nonrespondents. A sub-sample from non-responding group is then selected and a second more extensive attempt is made to the group so as to obtain the required information. Hansen and Hurwitz [5] proposed a technique of adjusting the nonresponse to address the problem of bias. The technique consists of selecting a sub-sample of the non-respondents through specialized efforts so as to obtain an estimate of nonresponding units in the population. This sub-sampling procedure albeit costly, it's free from any assumption hence, one does not have to go for a hundred percent response which can be substantially more expensive.
In developing the concept of domain theory with nonresponse the following assumptions are made; i. Both the domain study and auxiliary variables suffers from non-response. ii. The responding and non-responding units are the same for the study and auxiliary characters. iii. The information on the domain auxiliary variable d X is not known and hence d X is not available. iv. The domain auxiliary variables do not suffer from nonresponse in the first phase sampling but suffers from non-response in the second phase of sampling.

Proposed Domain Estimators
Let U be a finite population with N known first stage units. The finite population is divided into D domains;  In estimating the overall domain population mean in the presence of non-response, double sampling ratio estimation of the domain mean is used. Define; With the assumption that,

Mean Square Error of the Ratio Estimator
The expression for the Mean square error (MSE) of Where, Next, Consider,

Mean Square Error (MSE) of the Ratio Estimator ˆR
Substituting the values of equations (5) we obtain The mean square error (MSE) of the ratio estimator is given by;      ( )