Population Total Estimation in a Complex Survey by Nonparametric Model Calibration Using Penalty Function Method with Auxiliary Information Known at Cluster Levels

Nonparametric methods are rich classes of statistical tools that have gained acceptance in most areas of statistics. They have been used in the past by researchers to fit missing values in the presence of auxiliary variables in a sampling survey. Nonparametric methods have been preferred to parametric methods because they make it possible to analyze data, estimate trends and conduct inference without having to fully specify a parametric model for the data. This study, therefore, presents some new attempts in the complex survey through the nonparametric imputation of missing values by the use of both penalized splines and neural networks. More precisely, the study adopted a neural network and penalized splines to estimate the functional relationship between the survey variable and the auxiliary variables. This complex survey data was sampled through a cluster strata design where a population is divided into clusters which are in turn subdivided into strata. Once missing values have been imputed, this study performs a model calibration with auxiliary information assumed completely available at the cluster level. The reasoning behind model calibration is that if the calibration constraints are satisfied by the auxiliary variable, then it is expected that the fitted values of the variable of interest should satisfy such constraints too. The population total estimators are derived by treating the calibration problems at cluster level as optimization problems and solving it by the method of penalty functions. A Monte Carlo simulation was run to assess the finite sample performance of the estimators under complex survey data. The efficiency of the estimator’s performance was then checked by MSE criterion. A comparison of the penalized spline model calibration and neural network model calibration estimators was done with Horvitz Thompson estimator. From the results, the two nonparametric estimator’s performances seem closer to that of Horvitz Thompson estimator and are both unbiased and consistent.


Introduction
The concept of auxiliary variables in the present scholarship in statistics denotes independent or predictor variables in a regression analysis. As the name suggests, the variables offer additional information and may be used to improve the estimation of population parameters. As noted by [1], micro-econometric research is frequently performed using data collected by surveys, which may contain auxiliary information for every unit of the population of interest. As can be expected, many of these surveys use complex sampling plans to reduce costs and increase the estimation efficiency for subgroups of the population. Although the word complex survey has been used mostly by researchers to refer to different combinations of sampling plans, however, in this study, complex survey refers to a mixture of both stratified and cluster sampling methods [2].
The processes of estimation of population total and mean starts first with the point estimation of these missing values based on auxiliary variable using either parametric or nonparametric regression estimations. These are then classified according to the nature of the working model used at the estimation stage [11]. As a result, to obtain the estimator of interest, this study involved two stages of estimation. Stage one was to obtain point estimation of the missing values using penalized splines and neural network based on the auxiliary information at cluster levels. The second stage applied model calibration technique on the fitted cluster total values to obtain the population total as discussed in section 2.0 and 3.0, respectively of this paper. In model calibration, a distance measure defined on some design weights thought to be close to the inclusion probabilities is minimized subject to some calibration constraints imposed on the fitted values of the study variable. In penalty technique, the minimization is usually done by introducing penalty function whose solution gives the optimal design weights to be used in the estimation of population total. This study derived nonparametric model calibration estimators by treating the calibration problem as a nonlinear constrained minimization problem, which in turn transformed into an unconstrained optimization problem using penalty functions.
[7] employs neural networks for imputation with auxiliary information coming from administrative registers. The use of neural networks for model calibration in the study is new and allows for more flexible prediction and straightforward insertion of auxiliary information. A more complex model and generalized calibration procedure using model calibration was proposed by [12]. These scholars considered generalized linear models and nonparametric regression models for the super population model ψ given in the equation below.
where is a sequence of independent and identically distributed random variables with ( ) = 0 and ( ) = and ℎ( ) is a smooth function which can be estimated by any nonparametric methods like kernel, neural network and penalized splines. Given n pair of sample observations ( , ), … , ( , ) from a population of size , of interest, is the estimator ℎ ( ) of ℎ( ) = ( / ) E y x ψ . For model calibration, calibration is performed to the population mean of the fitted values ℎ ( ) [12]. The study considers a model calibration estimator for population total t Y given below.
where b is a set of sampled units under a general sampling design while , i w s is the design weights such that for a given metric, are as close as possible in an average sense to the 1 π is the inclusion probability. These weights are obtained by minimizing a given distance measure between the , i w s and , i z s subject to some constraints. The chi-squared distance measure to be minimized is as provided in the equation below.
where i q 's are known positive constants uncorrelated with the i z 's, [4] subject to two constraints in equation (4) and (5).
The model calibration method is intended to provide good efficiency when the model is correctly specified, but maintain desirable properties like design consistency when the model is misspecified [10]. This study embarked on a model calibrated rather than internal calibration approach because authors such as [5] showed that model calibrated estimators performed better than internally calibrated estimators. [6] proposed the use of nonparametric method to obtain the fitted values. In particular, they used neural networks and local polynomial in fitting the missing values for clustered data in one-stage sampling. An extension of this to two-stage sampling using kernel functions was done by [8] to fit the mean functions. Any nonparametric method such as kernel methods can be used to recover the fitted values for the non-sampled units. Such estimators are however challenging to employ in cases of multiple covariates and especiall when data is sparse. Another challenge involves incorporating categorical covariates. It is, therefore, important to consider other techniques of recovering the fitted values like penalized splines and neural networks when data is complex as discussed in the following section 2.0 of this paper.

Fitting of Missing Values
In this section, the study considered fitting missing values for a population divided into clusters which are then subdivided into strata. This section considered a case where there is auxiliary information known at Cluster Level only. The cluster total being the variable of interest, it was assumed to be dependent on some auxiliary variable . The study defined ... , 2 1 = as a population of auxiliary variables of size C with i x being known at th i cluster. This study further considered population of clusters; F to be partitioned into C clusters, each of size , 1, 2,..., Further, each cluster contains i L strata each of size , 1,2,..., Let also ijk y be th k observation in the sample from the th j stratum of th i cluster and i x be the corresponding auxiliary variable at cluster level. At stage one, a probability sample c of size i m of clusters is drawn from C according to a fixed design 1 ( ) P • (by simple random sampling), where 1 ( ) P c is the probability of drawing the sample c of size i m from C. The first order cluster inclusion probabilities, is obtained in the sample of clusters. This study uses [3] estimator to obtain cluster total estimator given by For penalized splines, this study considered a population of cluster F of size C for which several values for a random variable were missing at th i cluster. A matrix r X was considered with rows for i F ε , q is the degree of the spline, and the l k are the knots, while Further, rc X is the sub matrix of r X which consists of the rows T ri X for which the clucter i c ε , = ! 0, … , 0, #, … , # with q + 1 zeros on the diagonal followed by l penalty constants α.
The study considered the diagonal matrix of inverse inclusion probabilities 1 , This study let 1 ψ denote a super population of clusters model. To fit the missing values at the cluster level, the study defined the nonparametric population estimator If the fits are based on penalized splines, then the design weighted penalized spline smoother vector at i x due to Breidt and Opsomer, (2000) is considered as; Equation (8) is such that when applied to the sample c t it yields the nonparametric fit sample fit at i x for 1 ( ) On the other hand, the study considered the Montanari and Ranalli (2003) proposed neural network structure for smooth function, ℎ( ) defined by; In this case / represents the number of neurons at the hidden layer; m a εℜ , for 0 = 1, . . . , /, is the weight of the connection of the 0 34 hidden node with the response variable; qm r εℜ , for 0 = 1, . . . , / and 5 = 1, . . . , 6, is the weight attached to the connection between the 5 34 auxiliary variable and the 0 34 hidden node. The scalarsand , -( , for 0 = 1, . . . , /, represent the activation levels of the response variable and the / neurons at the hidden layer, respectively. The present study considered M as fixed and denoted the set of all parameters of the network by 7 as given; 7 = 8% , . . . , % ' , -, , . . . , * , , -, . . . , , -* , , , . . . , , * , 9 (11) where , ( = (, ( , . . . , , '( )′ for 0 = 1, . . . , / . To estimate the regression function in equation (9), [12] proposed obtaining a design consistent estimate of 7 in equation (10) and therefore, of the regression function at i x , for the fitted values.
Once the estimates 7 are obtained, the available auxiliary information is included in the estimator through the fitted values ℎ ; < = ℎ+ , 7 ., for = 1, . . . , . This is such that when applied to the sample c t it yields the nonparametric Using the fitted values in equation (9) and (12) this study proposed two types of model calibrated population total Using Penalty Function Method with Auxiliary Information Known at Cluster Levels estimator based on Neural Network ( NN y ) and based on the penalized splines; PS y for auxiliary variable available at cluster level and based on cluster-strata design with a general form as; ˆi Given the inverse inclusion probability as 1 i i z π = , the weight i w for penalized spline estimator was obtained by minimizing the chi-square distance measure in equation (3) subject to the constrains; The weight i w for NN estimator was obtained by minimizing the same chi-square distance measure in equation (3) subject to the constrains;

Penalty Function Method of Obtaining the Weights
The procedure of estimating the optimal weights i w is done by the penalty function method. This function method transforms the basic constrained optimization problem into an unconstrained optimization problem. In nonparametric model calibration estimation, this study followed the same procedure by [9]. The weight i w in equation (13) was obtained by minimizing the chi-square distance measure in equation (3) subject to the constraints in equation (14) and (15) for penalized spline and neural network estimators respectively.
In this case, ˆi t h is a nonparametric fit of the missing cluster total at the cluster level. Here, calibration constraint is defined on the fitted values in equations (9) and (12); this is called model calibration. The study then constructed an unconstrained problem as follows.
is a penalty function. Following same procedure by Rao (1984), equation (16) becames; where ( ) c T g is some function of the parameter c g which tends to infinity as c g tends to zero and also to zero. In this study, the penalty terms are chosen such that their values will be small at points away from the constraint boundaries and as the constraint boundaries are approached they will tend to infinity. As a result, the value of τ will also blow up as the constraint boundaries are approached. This study chose the value of 2 q = , then the penalty function from equations (14) for penalized spline and equation (17)  ( ) Equation (18) above was differentiated partially with respect to i w to give; Further, equating (19) to zero and solving for i w the weight becomes; ( ) A general weighted nonparametric estimator of population total in equation (12) for penalized splines and NN based on cluster strata design when auxiliary information is known at only cluster level is therefore obtained as To obtain the weights ( ) in the equation (21), this study solved the penalty functions (18) as an unconstrained minimization problem using an iterative procedure. The research in this case started with some initial guess for i w and c g then iteratively improved on the initial values until optimal values are obtained. The present study, therefore, followed the Newton method of unconstrained optimization, according to [9] Further if i W is let to be initial estimate of W ο so that i W W X ο = + . Taylor's series expansion of ( ) By neglecting the higher-order terms in the above equation The study followed the iterative procedure below inorder to find the improved approximations of * W The sequence of the points 1

Empirical Analysis and Discussions
In the simulation study, a population of size 10,000 (200*50=10,000) was simulated from a population structure containing 200 clusters each of size 50. Each cluster had 5 strata of size 10 each. At stage one 10, 20, 30, …, 190 clusters were sampled from the 200 clusters by simple random sampling while at stage two, 5 elements where drawn from each stratum by proportional allocation. This gave sample of 25 elements from each of the sampled clusters. 10 replications per each sample size were generated. For penalized spline method, the number of knots and the Spline penalty were optimally generated.
Using R program, a population of independent and identically distributed variable x was simulated using uniform (0, 1). In this study neural network and penalized splines described in section (2.0) of this paper was used to fit the cluster totals as a quadratic function = (20 + 6 ) where t i is the ith cluster total, and x is auxiliary information known at the cluster level. The cluster element values were generated as On the other hand, to differentiate one stratum from each other, the errors for the five strata were given as D ∈ (−0.0001,0.0001) for stratum 1, D ∈ (−0.0002,0.0002) for stratum 2, D ∈ (−0.0003,0.0003) for stratum 3, D ∈ (−0.0004,0.0004) for stratum 4 and D ∈ (−0.0005,0.0005) for stratum 5. This study reports on the performance of two estimators and their comparison in performance with that of Horvitz Thompson estimator. The performance of the two nonparametric estimators NN y for neural network and PS y for penalized spline were evaluated using its relative bias R B and relative efficiency R E . The relative bias is defined as

Normality Test
This study carried out a One-sample Kolmogorov-Smirnov normality test for the three estimators; ˆP S y , ˆN N y andˆH T y before their comparative analysis was done. The p values at α = 0.05 for the quadratic data obtained are as in table 1 below. Figure 1, figure 2 and figure 3 show a sample of graphical representation of normality for the quadratic functions. A pvalue greater than the set α = 0.05 significance level means normality is established. The results show that at α = 0.05 the proposed estimators are normal.

Results for Population total Estimates
The table 2 below represents the actual total and the estimates of the penalized splines, neural network and the Horvitz Thompson for 20, 50, 100 and 150 sample sizes. From the results, the estimator PS y seems to give estimates that are very close to those of Horvitz Thompson design estimator for all the four sample numbers. Although neural network estimates are not as close to Horvitz Thompson estimates as penalized splines estimates are, their difference isn't large.

Results of Variances and Variance Ratios for Various Sample Size
From respectively, are all slightly greater and less than one for the 19 replications. This implies that the estimator NN y is slightly highly variant than the both HT y and PS y estimators. Results in figure 4 represent variance for the three estimators. The variances in this figure seem to decrease as the sample size increases implying that all the estimators are consistent. Figure 5 represents the variance ratio; Var(ˆP S y )/ Var(ˆH T y ). This figure shows that the ratio was concentrated around one implying that both estimators are equally variant.  Figure 6 shows that the ratio points concentrated slightly above one while for figure 7, slightly below one implying that ˆN N y is slightly more variant than both ˆH T y and ˆP S y estimators.

Relative Bias
The following table 4 shows the values of relative biases. The results from the table show that the Relative biases for the three estimates are minimal given that the population totals were in thousands and this point to unbiasedness. On the other hand, comparing the penalized spline estimator with its corresponding Horvitz Thompson estimators, the difference is not significant, and they both have reduced bias than the neural network estimator.

Results on Mean Squared Errors for Various Sample Sizes
This section presents both mean square error and relative efficiencies of the three estimators, PS y based on a penalized spline, HT y based on Horvitz Thompson and NN y base on neural network and their respective graphs. The MSE and relative efficiency for different estimators are summarized in table 5 and table 6, respectively

Mean Squared Errors for the Three Estimators
Generally, the estimator with a smaller MSE is regarded as the most efficient one. From table 5, MSEs of ˆP S y and ˆH T y seems to be smaller than that of ˆN N y in some samples. However, from other sample sizes of 30, 50, 60, 140, 170 and 180 the ˆN N y seems to have reduced MSE than bothˆP S y and HT y .

MSE and MSE Ratio Graphs for Various Sample
Sizes Results in figure 8 show that the MSE for the three estimators decreases as the sample size increases, this point to the consistency of the three estimators. Figure 9 and figure  10 shows relative efficiency of the proposed nonparametric estimators ratio MSE ˆP S y /MSE ˆH T y and MSE ˆN N y /MSE ˆH T y respectively. The ratio for figure 9 is mostly concentrated at a point slightly below one and a point around one for figure 10. This implies that both nonparametric estimators are efficiently competing with the design estimator HT y . Figure 11 shows relative efficiency of the proposed nonparametric estimators ratio MSE ˆP S y /MSE ˆN N y . This ratio is mostly concentrated at a point around one as well implying that both estimators may be equally efficient.     This study can be applied to a real-world problem. In sampling, there are cases whereby some information may be missing due to non-sampling, non-response or even due to non-observed. Still, there is relevant auxiliary information about a variable at the cluster level. In such instances, this study recommends model calibrated estimators to be the estimators of choice. This study has shown that in cases where there are missing values at the cluster level but the auxiliary information is available at such level, then, an advantage can be taken of this auxiliary information to obtain cluster totals, which are then used in the estimation of population total.