On Bootstrap Confidence Intervals Associated with Nonparametric Regression Estimators for A Finite Population Total

The precision of an estimator is at times discussed regarding the variance. Usually, the exact value of the variance is unknown. The discussion relies on unknown populace quantities. When a researcher obtains the survey data, an estimate of the variance can, therefore, be calculated. When survey results are presented, it is good practice to provide variance estimates for the estimator used in the study. The estimator of the variance can further be used to construct confidence interval, assuming that the sampling distribution of estimator is approximately normal. This study proposes estimation of standard error and confidence interval for a nonparametric regression estimator for a finite population using bootstrapping method. The idea behind bootstrapping is to carry out computations on the collected data. Computation activity assists in estimating the disparity of statistics that are themselves computed from the same data. The variance of the Nadaraya-Watson estimator is derived, based on bootstrap procedure. This operation has led to the derivation of confidence interval associated with Nadaraya-Watson estimator of the population total. A simulation study has been carried out. The overall conclusion is that the confidence interval associated with Nadaraya-Watson estimator is tighter than all the other estimators (Horvitz-Thompson estimator, Local linear estimator, and Ratio estimator).


Introduction
The theory and application of sample surveys have in many extents grown over the last 50 years. A good survey should present a measure of precision for each estimate computed from survey data collected by survey design. A commonly used measure of precision is the variance of a survey estimator. There are several methods in literature for estimating the variance of an estimator. However, researchers are faced with a challenge to choose an appropriate variance estimator.
Kish and Frankel [16] asserts that variance estimate should be computed according to the complexity of the sample design, neglecting this complexity results to a mistake. On the other hand, obtaining more exact and complicated statistics like measures of variation (variance, standard error, mean squared error) of first order statistics becomes more sophisticated with non-linear statistics from complex surveys.
Efron [11] proposed Bootstrap method. This technique is a computer-based technique for estimating standard errors, biases, confidence interval and other measures of statistical accuracy and does not require any assumption about how your numbers are distributed.
The idea behind the bootstrap technique is the inclusion of multiple views of available samples from the studied population. Together, the estimates from the resample represent the possible range of the estimate in the population. A robust empirical confidence interval can then be estimated from the bootstrap distribution. There are many different methods for estimating confidence interval from a bootstrapped distribution. This technique utilizes the bootstrap distribution in various ways to arrive at confidence interval.
Non-parametric regression provides a computationally intensive estimation of unknown finite population quantities. Such estimation is frequently more flexible and robust than inference tied to design-based approaches. In this case, the use of nonparametric regression for inference on finite populations will be basically within the model based approach. In this framework, the study is concerned with the estimation of confidence interval associated with a nonparametric estimator for a finite population using bootstrapping method.

Review of Nonparametric Estimation of Finite Population Total
The use of nonparametric regression for inference on finite populations is firmly within the model-based approach.
Dorfman [10], compared population total estimators constructed from the Nadaraya-Watson estimator (nonparametric regression estimator) and design-based Horvitz-Thompson estimator and he found that nonparametric regression based estimator of a finite population total is a potent rival to familiar design-based estimators. It has the quality of automaticity associated with design-based estimators, but can better reflect the structure of the data and hence yields greater efficiency. In his finding, he also discovered that this regression estimator suffered boundary bias besides bandwidth selection challenges.
F. J. Breidt and Opsomer [4] proposed a type of modelassisted nonparametric regression estimator for the finite population total, based on local polynomial smoothing under two-stage sampling. They found out that nonparametric regression with application to local polynomial regression technique did better compare to Horvitz-Thompson estimator and to a great extent it improved the Nadaraya-Watson estimator.
In their study, Zheng & Little [25], were concerned with inference about the finite population total from probabilityproportional-to-size (PPS) samples. Model-based, jackknife and balanced repeated variance estimation methods for the pspline based estimators were developed. The simulation study showed that p-spline point estimators and their jackknife standard errors lead to an inference that is superior to Horvitz-Thompson or generalized regression (GR) based inferences. This suggested technique that nonparametric model-based prediction approaches can be successfully applied in the finite population setting by avoiding strong parametric assumptions.
F. Breidt et al. [5] Estimation of finite population totals in the presence of auxiliary information was considered. A collection of estimators based on penalized spline regression (nonparametric estimator) was proposed. Simulation experiments showed that the nonparametric estimators are effective than parametric regression estimators when the parametric model is misspecified while being approximately as efficient when the parametric specification is correct.

Review of the Variance Estimation Techniques for Finite Population Total
In the study by Royall & Cumberland [19], based on estimating the variance for setting large sample confidence interval about the best linear unbiased estimator when the model generating this estimator is inaccurate was considered. A robust variance estimator was derived, and its asymptotic properties were shown to compare favorably with those of the weighted least squares variance estimator. The robust variance estimator was shown to be asymptotically equivalent to the jackknife variance estimator under rather general conditions.
Binder [2], considered the problem of specifying and estimating the variance of estimated parameters based on complex sample designs from finite populations. The results are particularly useful when the parameter estimators cannot be defined explicitly as a function of other statistics from the sample. These results can be applied to linear regression, logistic regression, and log-linear contingency table models.
The study by Särndal et al. [22] considered design-based estimation of the variance associated with commonly regression estimator of the finite populace total. The usual Taylor linearization variance estimator is an expression in the design weighted regression residuals; in many applications, the resulted expression was counterintuitive from a model based standpoint. The improved variance estimator attached another simple weight, called "g-weight," to each residual. The results showed that the new variance estimator gave valid design-based confidence intervals, was nearly unbiased under a suitably chosen regression model, and performed well for conditional inference.

Review of the Confidence Interval for Nonparametric Estimation for the Finite Population Total
With basic knowledge in simulation besides resampling, it is a forthright step to approximate the quantities of interest which lack available true sampling distribution by the corresponding quantity of the bootstrap distribution, e.g., standard error, bias, standard deviation. Concerning confidence intervals, the situation is a bit more complicated. A variety of different concepts has been proposed by Davison and Hinkley [7] and DiCiccio and Efron [9] for producing approximate confidence intervals.
The study by Deshpande et al. [8], based on Ranked-set sampling from a finite population developed steps for constructing nonparametric confidence intervals for a population quantiles are considered. A simulation study based on finite populations was done, and the results showed that sampling approaches follow a defined ordering regarding the average lengths of the confidence interval they produce. This study also showed that all the three ranked-set sampling procedures tend to output confidence intervals tighter as compared to the results produced by simple random sampling technique. The result varies with the difference being substantial for the two protocols. The interpolated confidence intervals are shown to achieve coverage probabilities quite close to their normal levels. Ranking done according to a highly correlated concomitant variable are proved to reduce the level of the confidence interval only minimally.
Efron [12] considered the problem of setting confidence intervals for a single parameter in a multiparameter family. The ordinary approximate intervals rely on maximum likelihood theory was thought to be misleading. Tricks based on transformations, bias correctness, and so forth, are often used to improve their accuracy. The bootstrap confidence intervals discussed in his article automatically incorporated such tricks. The new intervals incorporated the improvement over previously suggested methods, which resulted in second-order correctness in a wide range of problems. Moreover, parametric bootstrap intervals are also developed for nonparametric situations.
Zheng and Little [26] investigated penalized spline nonparametric mixed models for inference about a finite population mean from two-stage samples. Simulation studies showed that model-based estimator (nonparametric estimator) performed better over Horvitz-Thompson estimator and linear model-assisted estimators. Simulation study also showed this estimator with variance estimation methods (empirical Bayes-based variance, jackknife and repeated replication) provided a narrower confidence interval with satisfactory confidence coverage.

Nonparametric Regression Estimation
According to Dorfman [27] , and + , is the standard normal kernel.

Nonparametric Regression Estimator for Finite Population Total
The area of nonparametric regression provides a range of methods to estimating There are multiple methods in literature for performing nonparametric regression estimation. The ordinary believes of all of the techniques is that the auxiliary variable x provides some measure of closeness of points, which are utilized to estimate a weighted sum / = ∑ ∈7 , where / rely on the distance of (sample) to / . Perhaps the ordinary version of this is (Nadayara-Watson) kernel estimation.
It can be shown that For proof see Dorfman [10]. Model-based estimators ignores sampling probabilities. It also ignores stratum boundaries. Except for the selection of bandwidth, and possible transformation of the auxiliary, it is an automatic estimator.

Variance of Nonparametric Regression Estimator for Finite Population Total
The variance of the nonparametric estimator 0 1 5 , is formulated using the variance of the predictor error. The predictor error is given by Assuming that the second derivative of is bounded and continuous and = , → ∞ , ℎ → 0 and =ℎ → ∞ , then variance of 0 1 5 is asymptotically estimated by Where A " = ? + Z AZ This implies that From equation (7) and (8) variance of nonparametric estimator 0 1 5 cannot be computed directly, in other words, this variance is formulated using the variance of predictor error. Estimation of this variance is more complicated and due to these difficulties in estimation bootstrap technique was considered in estimating this variance under the following algorithm.

Bootstrap Variance Estimation Technique
It involves drawing a series of independent samples from the sampled observations using the same sampling scheme as the one by which the initial sample was drawn from the population and calculating an estimate for each of the bootstrap samples. The following is a brief description of how the bootstrap technique works.
i. Using the sample data, construct the artificial population \ * , assumed to mimic the real, but unknown, population \. ii. Draw a series of independent samples, "resamples" or "bootstrap samples," from \ * by a design identical to the one by which s was drawn from \ . Independence implies that each bootstrap sample must be replaced into \ * before the next one is drawn. For each bootstrap sample, calculate an estimate 1 * , L = 1, … … ,` in the same way as 1 is calculated.
Using bootstrap algorithm discussed above variance of nonparametric estimator 0 1 5 can be estimated as follows The corresponding bootstrap estimate of standard error for 0 1 5 is given by

Empirical Study
To further examine the statistical properties of the mentioned confidence interval estimators, a simulation study will be performed. For simplicity, we assume that the errors are independently and normally distributed with homogenous variances and the instance where a single auxiliary variable x exists, will be considered.
The following superpopulation models are examined:

Results and Discussion
The nonparametric regression estimators show a satisfactory performance to the other estimators. Parametric estimators perform best when the model is well-specified. On a different side, when the model is misspecified, superior efficiency can be obtained by nonparametric estimators. This ability can be observed in Table 1 which reports estimates of the population totals using bootstrap method for six simulated populations: the nonparametric estimators are more efficient than their parametric counterparts (i.e. Nadaraya Watson estimator estimates population totals better compared to Horvitz-Thompson, Local polynomial). The linear function is a correct specification for the ratio estimator, and therefore it performs better than the other Horvitz-Thompson, Nadaraya-Watson and local linear estimators under this model because it is correctly specified. Nonparametric regression estimator traditionally works under the assumption that the regression function is smooth. The underlying regression function of jump population makes the regression function not meet the smoothing assumptions that nonparametric estimators traditionally do. In this nonrobust setting, Horvitz-Thompson outperforms Nadaraya-Watson.  Table 2 presents variance estimates of population totals for different nonparametric estimators under different models. From Table 2, the variance for Nadaraya Watson estimator seems to be smaller across all the models and the smaller the variability, the better. The confidence intervals describe the uncertainty inherent in the estimator and describe a range of values of which we can be reasonably confident that the actual effect lies. If the confidence band is narrow, the effect size of the width easily identified.
According to research, a wider confidence width expresses a higher level of uncertainty. Besides, it indicates that researchers have a little knowledge about the effect and that further information is needed. The size of the confidence interval for a single study depends on a large extent on the sample size. Extensive researches give detailed estimates, and it contains narrower confidence length than smaller studies. Table 3 reports a comparison of 95% confidence interval for the Nadaraya-Watson estimator, Ratio estimator, Local linear estimator and Horvitz-Thompson estimator for the different mean functions. The confidence intervals generated by Nadaraya-Watson estimator are much tighter and narrower than those generated by Horvitz-Thompson, Ratio and Local linear estimators. The results indicate that Nadaraya-Watson estimator performs better than the other estimators at 95% coverage rate.

Conclusions and Recommendations
This study aimed at constructing bootstrap confidence intervals for the finite population total using Nadaraya-Watson estimator approach to nonparametric regression. The survey reveals that confidence intervals constructed under Nadaraya-Watson estimator has better coverage ability, tighter and a narrower width which is desired for any coverage probabilities compared to local linear, Ratio and Horvitz-Thompson estimators.
Thus, in practice, the study recommends the use of the bootstrap technique in estimating the variance and constructing confidence intervals associated with nonparametric regression estimator.
Further work can be done using Jackknife technique, and Balanced Repeated replication and random group method.