Small Area Estimation of Poverty Incidence with Sampling Error Variances Through Generalized Variance Function

Small area estimation based on area level models, particularly the EBLUP method, typically assumes that sampling error variances of the direct survey small area estimates are known. In practice, the sampling error variances are unknown. This paper generates EBLUP estimates of poverty incidence when the sampling error variances are estimated using the generalized variance function (GVF) approach. The precision of the EBLUP estimates is determined using a modified version of the Prasad-Rao MSPE estimator. The modification is made by adding an extra term that would account the uncertainty associated with estimating the sampling error variances. The performance of the modified Prasad-Rao estimator relative to the commonly used Prasad-Rao estimator is evaluated through a simulation study. Results have shown that the modified Prasad-Rao MSPE estimator has relatively greater bias than the commonly used Prasad-Rao MSPE estimator, particularly for small samples. A slight gain in precision is observed when using the modified PR MSPE estimator, especially for large samples. Moreover, the findings imply that estimating sampling error variances using GVF models can be a very useful strategy in the application of EBLUP small area estimation, most particularly in poverty incidence estimation.


Introduction
In the Philippines, nationwide surveys such as the Family Income and Expenditure Survey (FIES), Labor Force Survey (LFS), and the National Nutrition Survey (NNS) are the principal sources of official statistics of the country. These surveys contain regions as sampling domains since it is timeconsuming and expensive to conduct the surveys with domains at the provincial or city levels. Consequently, official statistics with an acceptable level of reliability are produced only for the national and regional levels with urban-rural disaggregation. Since these official statistics are used by the government agencies in the articulation, implementation, and evaluation of their programs, it is then important to derive statistics at levels of disaggregation lower than the region, say provincial or even municipal and city level.
Aside from purely computing these statistics at levels smaller than the domain of the survey, say, at the provincial level, it is equally important to ascertain that these statistics are precise so that proper targeting of the right beneficiaries of these programs is achieved. In this way, government laws and programs can efficiently be delivered to those localities or even households that are really in greatest need. Consequently, government programs can have a greater impact on the people in solving problems efficiently and effectively. However, these alleviation programs can only be successful and effective if the "poor" are properly identified or properly located. For example, official poverty statistics, which are usually derived from the FIES, is being used by the Department of Social Works and Development (DSWD) and the National Anti-Poverty Commission (NAPC) to allocate funds for their respective poverty alleviation programs. The DSWD used the 2003 Small Area Estimates of poverty incidence generated by the National Statistical Coordination Board (NSCB) as one of the criteria for identifying eligible households for its Conditional Cash Transfer Program, otherwise known as Pantawid Pamilyang Pilipino Program (4Ps).
Haslett, et al., [5] expressed that small area estimation is a mathematical and statistical method that models data gathered from one or more data sources to produce estimates, like poverty, that is more accurate at small area level than using data collected from each small area. Moreover, the additional accuracy is achieved in many such models by "borrowing strength" for the estimate of a particular small area by using information from areas to which it is similar. For some small area estimation techniques combine data taken from different sources. For instance, census and information from new survey can be combined to update estimates from the original census. Alternatively, and this is more usually the case for malnutrition estimates, a statistical model is fitted to survey data gathered around the same time as the census. In addition, this model is useful to predict a variable not collected in the census, but based on variables that were collected in both survey and census.
However, they stressed further that survey based poverty statistics from smaller geographical areas or lower administrative level were usually less reliable because it has higher standard errors due to smaller sample sizes. This is why small area estimation comes into play (Haslett, et al., [5]). One way to generate more reliable small area statistics without increasing the cost of conducting a nationwide survey is by the use of small area estimation techniques. Recent developments in this field made use of the modelbased approach in which a predicting model was formulated using nationwide survey and census data sets and the resulting model was used to get the estimates of the small area. According to Albacea [1], the model-based estimates and the design-based estimates were then combined to produce estimates, commonly referred to as empirical best linear unbiased prediction (EBLUP) estimates. The EBLUP estimates were said to have the good properties of both the design-and model-based estimates.
The EBLUP approach which is oftentimes based on the Fay and Herriot (FH) model (Fay and Herriot, [6]) required that the sampling error variance of the direct estimates is available or known. But this may not be the case in practice since sampling error variances were most of the time unknown and have to be estimated based on sample data. The direct estimates of the sampling variances were based on the survey data and, as such, they were also subject to errors. Since the sampling variances play an important role in the construction of the EBLUP, the uncertainty involved in their estimation must be quantified and incorporated in the estimation of the MSPE of the EBLUP.
A modification of the EBLUP approach is studied and the commonly used measure of the precision of EBLUP estimates is correspondingly modified in this paper.

Data Sets
The data sets used in the study are from the 2000 FIES and the 2000 Census of Population and Housing (CPH). In the Philippines, poverty statistics, such as poverty incidence, are generated from the FIES data set. Bogin, et al., [3] defined poverty incidence as the ratio of the total number of poor households to the total number of households. These were the data sets used since, in general, small area estimation works best if the sources of primary and auxiliary data are conducted in the same year. In this paper, the source of primary data is the FIES (2000) and the source of auxiliary data is the CPH (2000).
The FIES of the Philippines from 1957 to 1975 was religiously gathered every five years. Nevertheless, in 1985, a new series of FIES had begun and the gap of conducting the survey was reduced to three years. Consequently, the nationwide survey of households is undertaken by the Philippine Statistics Authority (PSA). It provides essential data on family income and expenditure which include among other levels of consumption by an item of expenditure as well sources of income in cash and in kind. Specifically, it discusses the levels of living and disparities in income and spending patterns of families belonging to different income groups. Also includes related information such as a number of family members employed for pay or profit (or as wage, salary, or own-account workers); occupation, age, and educational attainment of household head; and other housing characteristics (PSA, [8]).
The 2000 Census of Population and Housing is also known as Census 2000 is the 11 th census of population and the 5th census of Housing undertaken in the Philippines since 1903. Census 2000 was designed to take an inventory of the total population and housing units in the Philippines and to collect information about their characteristics. The census of population is the source of information on the size and distribution of the population as well as information about the demographic, social, economic and cultural characteristics. On the other hand, the census of housing provides data on the supply of housing units, their structural characteristics, and facilities which have bearing on the maintenance of privacy, health and the development of normal family living conditions. These information are vital for making rational plans and programs for national and local development (PSA, [9]).

Modeling the Sampling Error Variances
The Fay-Herriot (FH) model (Fay and Herriot, [6]) assumed that the sampling variances are known in the model. In practice, however, sampling variances are unknown and are estimated directly from the survey data. This being the case, just like the direct estimates of the parameter of interest, the direct estimates of the sampling variances are also subject to errors. If in fact, direct estimates ɵ i θ are very imprecise due to small sample sizes for some or all areas, the corresponding sampling variance estimatesˆi ψ are also expected to be very imprecise due to the same sample size limitations (Bell, [2]). There is a limited number of studies have attempted to find solutions to this problem.
Bell [2] argued that one such solution is modeling ˆi ψ to develop improved estimates of the i ψ and to quantify their estimation error. Modeling the estimated sampling variances may result in improved measures of the MSPE. In addition, Wolter [14] introduced the method of generalized variance function (GVF), which is commonly used to obtain smoothed estimates of sampling variances. It is a mathematical model describing the relationship between the variance or relative variance of a survey estimator and its expectation. Likewise, Valiant [13] stressed that the principal advantage of this method is having quick estimates of sampling variances. In addition, these estimates are more stable and precise than the direct ones. The common GVF model specification is where X j is a vector of predictor variables potentially relevant to estimators of V pj , q j is a univariate "equation error" with mean zero and γ j is a vector of variance function parameters which needs to be estimated (Cho, et al., [4]).

Estimating Poverty Incidence
Using the FIES (2000) data, estimates of poverty incidence at the provincial level are generated based on the sampling design it provides. These estimates are referred to as the designed-based or direct estimates of poverty incidence. Generalized variance function method is then applied to model the sampling error variances of the direct estimates where the data from the 2000 CPH is used as the source of auxiliary variables. Again using the 2000 CPH as a source of predictors, regression-based estimates of poverty incidence are generated. The method of moments is used to estimate the model variance. Finally, the direct and the regressionbased estimates are combined to come up with EBLUP estimates of poverty incidence.
Two EBLUP estimation methods were employed in this study. Method 1 assumes the sampling error variances are known and are equal to the approximate mean square error of the design-based poverty incidence estimates, while Method 2 uses GVF estimates of the sampling error variances. The precision of the EBLUP estimates using Method 1 is measured using the Prasad-Rao MSPE estimator; while, a modified version of the Prasad-Rao MSPE estimator is used to measure the precision of the EBLUP estimates using Method 2. The EBLUP estimators, together with their corresponding MSPE estimators, are given below.

Method 1 (Known sampling error variances)
Under the basic area-level model of Fay and Herriot [6], the EBLUP estimator of the poverty incidence of the i th province ( i p ) is is the method of moments estimate of the model variance, i ψ is the known sampling error variance, ˆi p is the design-based estimator of i p , and i x is a set of auxiliary variables which are linearly related to ˆi p . Prasad-Rao [10] proposed an estimator of the precision of the EBLUP estimator in (2) as follows:

Method 2 (Sampling error variances modeled by GVF)
The EBLUP estimator of i p when sampling error variances were estimated using GVF method is ( )

Simulation Study
A simulation study is conducted to evaluate the performance of the modified Prasad-Rao MSPE estimator relative to the commonly used Prasad-Rao MSPE estimator. Treating the sample of 77 provinces as a pseudo-population, 1000 simple random samples of sizes 20 and 60 are drawn.
The performances of the two MSPE estimators are compared in terms of the following criteria: bias, absolute relative bias, and mean square error.

GVF models of Sampling Error Variances
Generalized variance function (GVF) method, as suggested by Wolter [14], is applied to come up with a model to estimate the sampling error variances of the direct estimates of poverty incidence. This approach addresses the issue on unknown sampling error variances. Several variations of the GVF models given in Wolter [14] are tried in the model-building process. Since the researcher is also interested in coming up with estimates of sampling error variances even in the absence of survey data for a particular small area, predictor variables from auxiliary data sources, such as the 2000 CPH, are also considered. To identify the variables to be included in the models, stepwise and backward elimination procedures are used.
The summary of the goodness-of-fit statistics for the models considered in the study is shown in Table 1. In Models 1, 2, and 3, 1 X is the reciprocal of the provincial total number of households headed by a female person having no spouse and is at least high school undergraduate, 2 X is the reciprocal of the average household size in a province, and 3 X is the reciprocal of the total number of barangays with postal service. Model 1 essentially follows the model of Wolter [14]. Although the properties of Model 1 are quite acceptable, the drawback with Model 1 is that it produces negative estimates of the relative variance, hence, negative estimates of the mean square error. To correct this problem, Model 2 is considered. The values of R 2 and adjusted R 2 substantially increase but there is still one province with negative predicted relative variance. Upon thorough inspection of the residuals, two outlying observations are detected. Deleting these two observations and refitting the same model, the R 2 and adjusted R 2 values increase slightly and the root mean square error (RMSE) and the Akaike Information Criterion (AIC) decreased substantially. Furthermore, the predicted values of the relative variance using Model 3 are all positive.
Estimates generated using Model 3 are plugged into (4) and (5) to generate the modified EBLUP estimates and their associated coefficients of variation (according to Method 2).

EBLUP Estimates of Province-Level Poverty Incidence and Their Coefficients of Variation
Two sets of EBLUP estimates of poverty incidence of the 77 provinces (excluding cities) of the Philippines are generated. The distribution of the EBLUP estimates of poverty incidence is shown in Table 2. Most of the provinces, 30 and 29 out of 77, respectively for Method 1 and Method 2, have poverty incidence between 40% to 50% while 27.27% and 24.68% of the provinces have poverty incidence greater that 50%, respectively, for Method 1 and Method 2. It is shown in Table 1 that there are only two (2) provinces with poverty incidence as high as 10% using both methods of estimation. From Appendix Table 1, the Province of Batanes has the lowest poverty incidence estimate (4.17% for Method 1 and 4.16% for Method 2), while the highest poverty incidence estimate is that of the Province of Sulu (70.92% and 68.29%, respectively for Methods 1 and 2). These results are in conjunction with that of Albacea [1] which indicates, among others, that Sulu is the poorest province in the Philippines in 2000.
On the average, provincial poverty incidence is estimated at 41.93% and 41.07%, respectively for Method 1 and Method 2. Wilcoxon signed rank test indicated that, on the average, the estimates generated using Method 1 is statistically higher than the estimates generated using Method 2 (W z =4.64, p<0.01). The distribution of the coefficients of variation of the EBLUP estimates of poverty incidence is presented in Table  3. About 65% and 69% of the provinces have very reliable estimates of poverty incidence (CV≤10%), respectively for Methods 1 and 2. This shows that Method 2 produced more reliable estimates than Method 1. On the average, the coefficients of variation of the EBLUP estimates produced using Method 2 (9.84%) are higher, but less dispersed, than the coefficients of variation of the EBLUP estimates generated using Method 1 (9.69%). On the other hand, a test of significance indicates that there is no significant difference in the coefficients of variation of the EBLUP estimates generated using the two methods (W z =0.78, p>0.05).
From Appendix Table 1, the Province of Batanes has the least reliable estimate of poverty incidence with a coefficient of variation of 62.21% and 62.26%, respectively for Methods 1 and 2; while, the provinces of Masbate and Guimaras have the most reliable estimates with coefficients of variation of 4.44% and 4.85%, respectively for Methods 1 and 2. Based on EBLUP estimates, the eight poorest provinces are listed in Table 4. For both methods of estimation, half of the eight poorest provinces are from Region 15 and the rest are from Regions 4B, 5, 13, and 14. For Method 1, all of the 8 poorest provinces have very reliable poverty incidence estimates (CV<10%). Meanwhile, for Method 2, only 6 out of the 8 poorest provinces have very reliable estimates of poverty incidence. It can be gleaned in Table 4 that the composition of the eight poorest provinces is the same for Methods 1 and 2. For both methods of estimation, Sulu, Romblon, and Agusan del Sur are consistently the first, fourth, and eighth poorest provinces, respectively; but there is a repositioning of the other provinces from Method 1 to Method 2.

Results of the Simulation Study
The statistical properties of the two MSPE estimators (3) and (5) are compared through a simulation study. The results are shown in Table 5. In terms of bias, on the average, the modified Prasad-Rao (M-PR) estimator is significantly more biased than the Prasad-Rao [10] (PR) estimator (W z =5.146, p<0.01). Likewise, in terms of absolute relative bias, on the average, the M-PR estimator is significantly more biased than the PR estimator (W z =6.66, p<0.01).
For small samples, on the average, the M-PR estimator is significantly less precise than the PR estimator (W z =4.83, p<0.01). However, for large samples, on the average, there is no significant difference in the level of precision of the two estimators (W z =1.52, p>0.05).
Finally, increased sample size leads to decrease in the values of the three evaluation criteria.

Conclusions and Recommendation
Modeling sampling error variances using GVF approach has been shown to be a potential technique for use in EBLUP estimation of poverty incidence. The technique has produced EBLUP estimates which are statistically lower than the EBLUP estimates obtained assuming known sampling error variances. Further, the GVF approach has led to a slight, though not statistically significant, increase in the number of provinces with very reliable estimates.
The results of the simulation study suggest that the modified version of the Prasad-Rao MSPE estimator did not perform well relative to the commonly used Prasad-Rao estimator in terms of bias and absolute relative bias. However, in terms of the MSE, there is a slight gain in precision using the GVF estimated sampling error variances, especially for large samples. These imply that estimating sampling error variances using GVF models can be a very useful strategy in the application of EBLUP small area estimation, most particularly in poverty incidence estimation. However, it is suggested that a thorough study should be made to further modify and improve the Prasad-Rao estimator to completely capture the variability associated with modeling the sampling error variances using the GVF approach.