Optimal Nonparametric Regression Estimation of Finite Population Total Using Nadaraya Watson Incorporating Jackknifing

In this study a model based approach is adopted and a robust estimator of the jackknifed Nadaraya Watson estimator of the finite population total is proposed by incorporating the jackknifed procedure into the nonparametric regression estimator (the case of Nadaraya Watson). The study sought to estimate the finite population total using the proposed estimator (Jackknifed Nadaraya Watson). The study also looked at the various approaches of estimation of finite population totals and their properties. To measure the performance of each estimator, the study considered the average bias, the efficiency by the use of mean squared error and robustness using the rate of change of efficiency. Numerical study using simulated population was employed to examine the performance of the proposed estimator and compared it with the already existing estimators (HorvitzThompson, Nadaraya Watson, Ratio estimator). The simulation experiment showed that the proposed estimator records better results in terms of Bias and mean squared errors (MSE).


Introduction
In many complex surveys, available information about the study population can be used at the design and estimation stages to construct efficient procedures for the finite population quantities i.e. population total or mean so as to increase the precision of the estimators of such population quantities. The information can be collected by national census, official registers, natural resources inventories and remote sensing data. Estimation being the main concern in surveys, emphasis is usually on the use of the auxiliary information. One of the approaches is to assume a working model and more often a linear model describing the relationship between the survey variable and auxiliary variable is selected. Estimators are then derived from this linear model. However for efficient use of any of these estimators prior knowledge of the specific parametric structure of the population needs to be known and this is usually problematic especially if the model is to be used for many variables [1]. Because of these concerns more focus has been given to non-parametric models describing the relationship between the auxiliary variables and the study variables are assumed [2]. The idea of nonparametric traces its origin in works by [3]. Non-parametric based estimation is often more robust and flexible than inference based on parametric regression models or design probabilities (as in the case of design-based inference) [4]. A variety of approaches exist in the construction of more efficient estimators and they include; Model-based and design based methods. In Modelbased approach, the idea is based on super population models which assumes that the population under study is a realization of a random variable having a super population model . The model is used to predict the non-sampled values of the population hence finite population quantities [5]. [6] First considered non parametric models for and obtained a local polynomial regression estimator as a generalization of the ordinary generalized regression estimator. From the simulation, their study showed the proposed estimator performed better than the other parametric estimators. [7] Improved on [6] estimator and developed a model-based local polynomial regression estimator applicable to direct sampling designs i.e. simple random sampling and systematic sampling. Their estimator demonstrated better results than [7]. In this study auxiliary information is used to determine the estimate of finite population total using non-parametric regression in the

Review of the Jackknifing Estimator
Jackknifing is one of the re-sampling techniques that are commonly used to reduce bias found in estimators and also assessing variability. Basically in jackknifing procedure, the idea is generalized into splitting the sample into g groups of size h each. This thus implies that = ℎ. The procedure involves the following: Let Y 1 , Y 2 ,.. Y n be a sample of independent and identically distributed random variables. Again let be an estimator of the parameter based on sample of size n. Let be the corresponding estimator based on the sample of size ( − 1)ℎ, where the i th group of size h has been deleted. [8] and [9] defined that Then the estimator It's important to note that the estimator (2) has the property that it eliminates the order term from bias of the form

Jackknifing in Model Based Estimation of Finite Population Totals
The algorithm of estimating the total using the Nadaraya Watson with the Jackknife technique is given in this section. The technique of deleting a single case from the Original sample (delete one jackknife) sequentially will be adopted. Suppose that a database consists of n vectors (& , ' , ' ( … , ' ) , where & is the study variable and ' , ' ( … , ' are considered auxiliary variables. Let ' = ' , ' ( … , ' and ) * = (+ * , , * ), -= 1,2, … , denote the values associated with the ./ observation. In this case, the set observations is the vector () , ) ( , … ) ). Then the Jackknife Procedure based on delete one is as follows: i. Draw sized sample from a population randomly and label the elements of the vector ) * = (+ * , , * ) , -= 1,2,3, . . . , .

The Proposed Estimator
Dorfman [3] introduced a non-parametric regression estimator for finite population total based on a sample drawn from the population. Taking into consideration a population consisting of N units, the author sought to estimate the finite population total defined by: We propose the Jackknife estimate of the population total to be defined as Now on deriving the asymptotic variance of the error term, we note that It's easy and it can be shown that =>̂ 4 @ = 0. Therefore, But we know that Then similarly  But as tends to be large, − 1 ≅ therefore follows that BC => 4 @ = K ( , 4

Description of the Population
In the study four populations are considered, which are generated from regression model [10]

Simulation Results
The results of the study are summarized in the tables 1 to 8 and the plots 1 to 4. On each population the performance of each estimator is analyzed using the average bias and mean squared error. The average bias is an indication of the measure of how close the estimator is from the true value, while the MSE is used to assess the efficiency of an estimator. For each combination of mean functions, standard deviation (s.d=0.1), and at optimal bandwidth h=0.1429514, 1000 replicates from samples of the four (4)  In table 3 the Jackknifed Nadaraya Watson estimator performs better than Nadaraya Watson estimator and the Ratio estimator except the Horvitz-Thompson estimator, this is because the mean function is correctly specified and the other estimators are considered competitive.
The Biases for the Jackknifed Nadaraya Watson are smaller compared to the Horvitz-Thompson estimator and the Ratio Estimator but slightly larger compared to the Nadaraya Watson. Table 1-3 shows biases of the Jackknifed Nadaraya Watson having negative values a manifestation that the Jackknifed Nadaraya Watson estimator tends to underestimate the True population total while the Nadaraya Watson estimator also tends to overestimate the True population total but only in cases where the sample size is large enough.
The confidence lengths generated by the Nadaraya Watson estimator are much tighter than those generated by JNW, HT, and Ratio estimator's at large bandwidths but at lower bandwidths, the JNW gives much tighter confidence lengths compared to the other estimators. Note that the best performing estimator is one whose coverage rate is close to the true population total and its length is small i.e a smaller confidence length is better because it implies that the true population total is captured within a small range and therefore the results are more precise.

Conditional Properties of the Nonparametric Estimators
The figures below show behavior of the conditional bias for each estimator when various mean functions were used. In figure 1 and 2 the Ratio estimator and the HT estimator performed well when a linear function was used. This is because the Ratio Estimator is the Best Linear Unbiased Estimator (BLUE). It can be observed that the biases to the left of the population mean of the auxiliary variable ( x =248), are small but reduce towards the right. As the population grows larger the Jackknifed Nadaraya Watson estimator performs well than all the other estimators.
In figure 3 and 4 the Quadratic mean functions were used. The proposed estimator (JNW) gives better estimates of the population total compared to the ones realized using Nadaraya Watson estimator that was proposed by [3] and the Ratio estimator. It can be observed that the biases of JNW remains lowest throughout.

Performance of the Estimators at Varying Bandwidths
From table 5 and 6 the two estimators (Nadaraya Watson and Jackknifed Nadaraya Watson) perform well under the linear and homoscedastic condition. However in comparison with JNW the Nadaraya Watson estimator has lower bias and the values of JNW are negative indicating that it tends to underestimate the value of the True population total. The small values of the bias indicate that the estimated values of the population total are closer from the True population value. The values of the MSE from the JNW outperforms the Nadaraya Watson and this implies that the estimator (JNW) has efficiency in linear and homoscedastic population structure. Table 7 and 8 (Quadratic structure) of the population, it can be noted that the Nadaraya Watson estimator has the least absolute bias compared to the jackknifed Nadaraya Watson estimator. It's important to also note that the Nadaraya Watson is the best estimator for Quadratic and homoscedastic population. The JNW estimator has the lowest MSE especially for small values of bandwidth (h) making it the most efficient estimator compared to the Nadaraya estimator.