Domain Mean Estimation Using Double Sampling with Non-Linear Cost Function in the Presence of Non Response

This paper describes theoretical estimation of domains mean using double sampling with a non-linear cost function in the presence of non-response. The estimation of domain mean is proposed using auxiliary information in which the study and auxiliary variable suffers from non-response in the second phase sampling. The expression of the biases and mean square errors of the proposed estimators are obtained. The optimal stratum sample sizes for given set of non-linear cost function are developed.


Domains
Domain is a subgroup of the whole target population of the survey for which specific estimates are needed. In sampling, estimates are made in each of the class into which the population is subdivided; for instance, the focus may not only be the unemployment rate of the entire population but also the break-down by age, gender and education level. Units of domains may sometimes be identified prior to sampling. In such cases, the domains can be treated as separate stratum from a specific sample taken. Stratification ensures a satisfactory level of representativeness of the domains in the final sample. These domains are called planned domains.

Domain Estimation
Consider a finite population under study U of size N divided into D domains; 1 2 , ,..., D U U U respectively.
Domain membership of any population unit is unknown before sampling. Its assumed that the domains are quite large and for a typical th d domain d U several characteristics maybe defined as described by Gamrot [4]. This includes; ( )( ) of a non-informative Bayesian approach where a polya posterior is used on finite population that has little or no prior information about the population. Although a prior distribution is not specified there is a posterior distribution which may be used to make inferences. Udofia [12] proposed estimate of domains using double sampling for probabilities proportional to size (PPS) with known constituent domain. The assumptions proposed by Udofia [12] are; (i) The size of auxiliary variable X is not known.
(ii) The distribution of the variable ( ) Z that defines the domain is the not known prior and therefore the population size ( ) j h N of the domain is also known.
(iii) The cost of measuring the variable X and Z in each stratum is much lower than that of measuring of the study variable Y .
Aditya et al. [1] developed a method of estimating domain total for unknown domain size in the presence of nonresponse with a linear cost function using two-stage sampling design. In this method the response mechanism is assumed to be deterministic.

Double Sampling in the Presence of Auxiliary Information
In many sampling procedures the prior knowledge about the population mean of the auxiliary variable is required. If there is no such information, it's easier and cheaper to take on the large initial sample from which the auxiliary variable is measured and from which the estimation of the population parameters like the total, mean or the frequency distribution of the auxiliary variable X is made. Srivastava [11] proposed a large class of ratio and product estimators in double sampling. It was found that the asymptotic minimum variance for any estimator of this class is equal to that which is generally believed to be linear regression estimators. According to Sahoo and Panda [10] if an experimenter knows the population mean of an additional auxiliary variable, say, Z whereas the population mean of an auxiliary variable X is unknown and can be estimated using double sampling scheme, it is possible to come up with a class of estimators for the finite population mean Y µ .

Double Sampling for the Ratio Estimator in the Presence of Non-Response
Hansen and Hurwitz [5] proposed a way of dealing with non-response to address the bias problem. In this case, when dealing with non-response, a sub-sample is taken from the non-respondents to get an estimate of the sub-populations represented by the non-respondents. Cochran [2] employed Hansen and Hurwitz [5] technique and proposed ratio and regression estimation of the population mean of the study variables where the auxiliary variable information is obtained from all the sample units with some of the sample units failing to supply information on the study variable. According to Oh and Scheuren [8] and Kalton and Karsprzyk [6], non-response is often compensated by weighting adjustment and imputation respectively. In these methods it was argued that the procedure used in weighting adjustment and imputation aimed at eliminating the bias due to nonresponse. Okafor and Lee [9] employed the double sampling method to estimate the mean of the auxiliary variable and went ahead to estimate the mean of the study variable in a similar way as Cochran [2]. In this method double sampling for ratio and regression estimation was considered. The distribution of the auxiliary information was not known and hence the the first phase sample was used to estimate the population distribution of the auxiliary variable while the second phase was used to obtain the required information on the variable of the interest. The optimum sampling fraction for the estimators for a fixed cost was derived. Performances of the proposed estimators were computed and compared with those of Hansen and Hurwitz [5] estimators without considering the cost. It was noted that for the results for which cost component was not considered, regression estimator functions were consistent than the Hansen and Hurwitz [5] estimator. Chaudhary and Kumar [3] proposed a method of estimating mean of a finite population using double sampling scheme under non-response. The proposed model was based on the fact that both the study and auxiliary variable suffered from the non-response with the information of X not available. Hence the estimate of X at first phase is given by, With the corresponding variance of, S are mean square errors of the entire group and nonresponding respectively with / L as the inverse sampling rate at first phase of the sampling.
From the previous studies, a number of researchers have considered a linear cost function when estimating domains. In dealing with non-response most of them have considered subsampling while holding to the idea that the response mechanism is deterministic. This study therefore focuses on the estimation of domain mean using double sampling for ratio estimation with non-linear cost function with a random response mechanism. In this study we therefore establish an efficient and cost effective method of estimating domains when the travel cost component is inclusive and it is not linear. The problem of minimum variance and cost is addressed while considering non-linear cost function and optimal sample size.

Developing Domain Concept Theory with Non-Response
The problem of non-response is inherent in many surveys. It always persists even after call-backs. The estimates obtained from incomplete data will be biased especially when the respondents are different from the non-respondents. The non-response error is not so important if the characteristics of the non-responding units are similar to those of the responding units. However, such similarity of characteristics between two types of units (responding and non-responding) is not always attainable in practice. In double sampling when the problem of non-response is present, the strata are virtually divided into two disjoint and exhaustive groups of respondents and non-respondents. A sub-sample from nonresponding group is then selected and a second more extensive attempt is made to the group so as to obtain the required information. Hansen and Hurwitz [5] proposed a technique of adjusting the non-response to address the problem of bias. The technique consists of selecting a subsample of the non-respondents through specialized efforts so as to obtain an estimate of non-responding units in the population. This sub-sampling procedure albeit costly, it's free from any assumption hence, one does not have to go for a hundred percent response which can be substantially more expensive.
In developing the concept of domain theory with nonresponse the following assumptions are made; i. Both the domain study and auxiliary variables suffers from non-response. ii. The responding and non-responding units are the same for the study and auxiliary characters. iii. The information on the domain auxiliary variable d X is not known and hence d X is not available. iv. The domain auxiliary variables do not suffer from nonresponse in the first phase sampling but suffers from non-response in the second phase of sampling.

Proposed Domain Estimators
Let U be a finite population with N known first stage units. The finite population is divided into D domains; Similarly the estimate for domain auxiliary variable is given by; With the assumption that,

Bias and Mean Square Error of the Ratio Estimator
The expression for the Mean square error (MSE) of With the assumption that ( ) ( ) ( ) Where, Consider, Next, Consider,

Bias of Ratio Estimator
The bias of the ratio estimator

Bias of Ratio Estimator
The bias of the ratio estimator 2 R d Y is given by,

Mean Square Error (MSE) of the Ratio Estimator ˆR
Substituting the values of equations (5) we obtain

Estimation of Sample Size in the Presence of Non-Response
Estimation of domain mean is developed using double sampling design based on the technique of sub-sampling of both the study and auxiliary variable of the non-response with unknown domain size. A study of cost surveys is therefore considered where a non-linear cost function is employed in obtaining the optimal sample sizes by minimizing variance for a fixed cost

Optimal Allocation in Double Sampling for the
Estimation of Domain θ θ ω θ λ λ Let,