Spatial Econometric Model of Poverty in Java Island
Mulugeta Aklilu Zewdie^{1}, M. Nur Aidi^{2}, Bagus Sartono^{2}
^{1}Department of Statistics, Faculty of Natural and Computational Science, Mekelle University, Mekelle, Ethiopia
^{2}Department of Statistics, Faculty of Mathematics and Science, Bogor Agricultural University, Bogor, Indonesia
Email address:
To cite this article:
Mulugeta Aklilu Zewdie, M. Nur Aidi, Bagus Sartono. Spatial Econometric Model of Poverty in Java Island. American Journal of Theoretical and Applied Statistics. Vol. 4, No. 6, 2015, pp. 420-425. doi: 10.11648/j.ajtas.20150406.11
Abstract: This paper gives the concept of spatial econometric model and applies it to analyze the spatial dimensions of poverty and its determinants using data from Java Island 2010 census survey, for 105 districts of Java Island. Dependent variable used in this research is percentage of poverty rate at particular district and predictors are some selected variables that are correlated to poverty. Weighted matrix is obtained by using queen contiguity criteria and four statistical models are applied to the data, Ordinary Least Square regression model, Spatial Error Model, Spatial Lag Model and Spatial Durbin Model. It is shown that the OLS estimates of the poverty function suffer from spatial effects that indicated the OLS model are miss specified since Moran Index test also confirmed the existence of spatial autocorrelation. LM and Robust LM are used for testing the existence of spatial effect. The Likelihood Ratio common factor test and AIC are used for model selection criteria. Gauss Markov Assumptions are done and the Spatial Lag model proved to be better than other model for a given data and the result shows that Education and Working hours has significant impact on poverty.
Keywords: Poverty, Spatial Effects, Econometrics, Spatial Error Model, Special Lag Model, Spatial Durbin Model, LM, Robust LM, LRcom, Gauss-Markov & AIC
1. Introduction
Poverty is pronounced deprivation in well-being. It is a broad front. It is about income levels. It is about food security. It is about quality of life. It is about asset bases. It is about human resource capacities. It is about vulnerabilities and coping. It is about gender inequalities. It is about human security. It is about initiative horizons. It is each of these and all of these together. An economic approach to poverty frequently measures poverty quantitatively in terms of per capita consumption, income levels or calorific intakes, such methods used by the World Bank and the UN, Which reflects the minimum income or consumption necessary to meet basic needs. For low-income countries, the World Bank has calculated poverty lines between $1 and $2 a day. Although these minimum requirements vary across countries and over time, $1 and $2 a day measures allow policy makers to compare poverty across countries using the same reference point.
Poverty is one of the fundamental problems that become the center of attention of the governments of all countries in the world, especially for developing countries like Ethiopia and Indonesia as example but my focus for the time being is Indonesia. I will back by my second research to my home land Ethiopians poverty. Here Indonesia's poverty line is determined by a complex function taking in what the poor spend on different kinds of food to reach 2.100 calories per day, as well as costs associated with dozens of non-food goods, including housing, clothing, education and health care. The poverty line is established as an average, allowing for the fact that prices vary widely from urban to rural areas, and from more prosperous Indonesian regions. Based on the government's official poverty line is 233.740 rupiah per capita per month which is close to UN poverty line measurement from 1-2 dollar a day. As National Development Planning Agency (Bappenas) report peoples living below poverty line in Indonesia is still too high in number. In 2010, Bappenas as figured the numbers of poor people in Indonesia are around 31.02 million. Additionally, Bappenas noted that as much as half of the total percent or around 55.83% of the total poor population in Indonesia settled in Java Island (Bappenas 2010). Java Island is the most populous Island in Indonesia. It consists of 6 provinces namely the Special Capital Region of Jakarta, West Java,Banten, Central Java, Yogyakarta and East Java. Each province consists of several districts. One of the efforts made to address the problem of poverty is to identify the variables that affect poverty on these districts. Poverty studies have, for some time, sought to disaggregate the poor in order to refine the understanding of causes of poverty and design effective interventions.
Objectives:
a. To identify the variable that significantly determine poverty
b. To make policy recommendation to prevent and alleviate poverty
c. To compare the best model among Traditional Econometric Model and Spatial Econometric Models
2. Literature Review
A key element affecting poverty is regionalism said Levernier and concluded that economic development targeting predominantly African-American community’s counties would be most effective in alleviating poverty. Triest concluded that increased employment of the low-income would narrow the interregional gap in poverty. Goetz suggested that government can increase investment in social capital to reduce the poverty rate by easing transaction costs paid by local associations. Findeis found that welfare assistance to help the poor workers had effects on poverty in metro areas. Mauro found that the poor countries tend to have corrupted bureaucracies and politic instability. (McKay & P. 2011)
According to scientists and researcher the key factor correlated to poverty are, Regional level characteristics, which include vulnerability to flooding or typhoons, Remoteness, quality of governance, and property rights and their enforcement. Community-level characteristics, which include the availability of infrastructure (roads, water, electricity) and services (health, education), proximity to markets, and social relationships. Household and individual characteristics, among the most important of which are Demographic, such as household size, age structure, dependency ratio, gender of head. Economic, such as employment status, working hours, property owned. Social, such as health and nutritional status, education and shelter are more correlated to poverty.
3. Research Methods
3.1. Data
The data was collected by BPS Indonesia, in 2010. The responses variable in this study is percentage of poverty rate at particular region of 105 districts. To make all the response and explanatory variables continuous all variable changed in to percentage. The explanatory variables that are included in this study by assumed to be correlates to poverty are:
X1: percentage of Unemployment rate X2: percentage of Malnutrition rate X3: percentage of Child mortality rate X4: percentage of Morbidity (occurrence of disease) X5: Percentages of household more than high school X6: percentage of access to clean water X7: percentage of non-sanitation X8: percentage of Literate rate X9: percentage of Employment rate X10: percentage of unworked hour per week X11: percentage health complain of the household X12: Length of sickness.
3.2. Multiple Linear Regressions
Simple linear regression model is not adequate for modeling many economic phenomena, because in order to explain an economic variable it is necessary to take into account more than one relevant factor. Multiple linear regressions is given by the following expression. (Rawlings (1998))
(1)
Where : percentage of poverty rate in the i-th district
: regression parameters
: predictor variables
: random error term iid with mean zero and constant variance. When spatial autocorrelation exists, in the above classical linear regression model; the error term and dependent variable have to take the autocorrelation into account. (Anselin 2001) and look for spatial models because in linear regression analysis, the resulting parameter estimates are biased, inconsistent and the R square values is not an accurate fitness of fit measure due to violation of assumption.
Weighted matrix (W) is an nxn squared (row standardized) matrix that define who is neighbors with who that reflects the intensity of the geographical relationship between observations in a neighborhood. For this research the researcher used contiguity weighted matrix that is based on queens, two regions are neighbors in this sense if they share any part of a common border, no matter how short is it.
(2)
3.3. Spatial Autocorrelation Test
Spatial auto-correlation is described as a situation in which the dependent variable or error term at each location is correlated with observations on the dependent variable or values for the error term at other locations. It measures how much close objects are in comparison with other close objects. One of the most common tests for the existence of spatial autocorrelation is Global Moran's I which depends on a "weighted matrix" at particular data residual or vectory.
Moran test statistics (3)
Where I is moran index of percentage of poverty rate ,E(I) is the expected value of moran index ,Var(I) is variance of moran index.Hypothesis
H_{0}: I = 0 (no autocorrelation) H_{1}: I 0 (the is a positive or negative autocorrelation depends on sign of I) Reject Ho if
Positive spatial autocorrelation occur when similar values cluster together in a map and negative spatial autocorrelation occur when dissimilar values cluster together in a map .One of the main reasons why considering spatial auto-correlation is important because statistics relies on observations being independent from one another. If autocorrelation exists in a map, then this violates the fact that observations are independent from one another and significant spatial autocorrelation, (spatial dependence or lack of independent in spatial data) exists either globally or locally, spatial heterogeneity (uneven distribution of relationship across a region) exists and accordingly non constant errors. (Anselin 1988, 2010).
The spatial autocorrelation model is a combination of spatial lag effect model and spatial error model which calls most of the time Simultaneous autoregressive model or general spatial model according to (Lesage, 2009).
) With: (4)
Where: : spatial error coefficient; : spatial lag coefficient
W: n X n spatial weighted matrices
y: vector of response variable (n x 1)
x: matrix of predictor variable (n x (k+1))
u : error vector (n x 1)
: vector of uncorrelated error term (nx1)
3.4. Spatial Lag Model
From the Spatial Autocorrelation model restricting the spatial error effects parameters equal to 0 can derive other models SAR. Meaning λ= 0, a "spatial lag" model or following SAR model can be derived which is analogous to the time-series lagged dependent variable.
(5)
3.5. Spatial Error Model
When ρ in Spatial Autocorrelation model is set to 0, a spatial error model (SEM) with spatial effect of error term can be derived the form:
(6)
3.6. Spatial Durbin Model
Spatial Durbin Model (Lesage, 1999)
(7)
Where equals regression coefficients of the exogenous spatial lags. This just adds average-neighbor values of the independent variables to the specification. All the parameter of above models estimated using maximum likelihood estimation methods except multiple linear regressions.
There are several diagnostic tests that could be used to test the significance of spatial effects, Lagrange Multiplier (LM-lag and LM-error) tests spatial dependence but residual plots and residual maps are also examined to locate extreme values and reveal heterogeneity, globally and locally.
Lagrange Multiplier Test for Spatial Error (LM-error) (Anselin2001&2010)
Hypothesis: H_{0}: = 0 (no spatial error effect)
H_{1}: ≠ 0(there is spatial error effect)
Test statistics
(8)
Reject Ho LM-error >
Lagrange Multiplier Test for Spatial Lag (LM-Lag) (Anselin 2001)
Hypothesis: H_{0}:= 0 (there is no spatial lag effect)
H_{1}:≠ 0 (there is spatial lag effect)
Test statistics
Where
(9)
Reject Ho LM-lag>
Likelihood Ratio common factor Test
Hypothesis: H_{0}: Vs H_{1}:
Test statistics
(10)
are log-likelihood function of unrestricted model and log-likelihood function of the restricted model respectively. (Angulo, A. (2006)) Reject or accept by using p-value criteria.
Model selection can be helpful to identify a single best model or to make inferences from a set of multiple competing hypotheses up to now, however, only a few model selection procedures have been tested for spatially auto correlated and spatial lag data. Therefore the researcher developed model selection procedures and selected the best models among OLS, SDM, SAR and SEM by model selection criteria of Akakian information criteria (AIC).
(11)
Where p is the number of coefficients in the regression equation, normally it is equal to the number of independent variables plus 1 for the intercept term.
Finally the best model checked all Gauss Markov assumptions and Multicollinearity; For Homogeneity of error term the researcher used Breusch pagan test to test the model error term is homoscedasticity against heteroscedasticity; For Normality of error term the researcher used Kolmogorov Smirnov test to test the model error term is normal against non normal and finally the best model also checked the existence of autocorrelation or independent of error term by using Durbin Watson test and Moran test.
3.7. Steps of Analysis (Used Software R)
1. Data exploration with graph and descriptive statistics
2. Analysis Multiple linear Regression model using OLS estimation
3. Create row standardized weighted matrix Using contiguity Queen Criteria
4. Test for the existence of spatial autocorrelation using Moran I test
5. Test for spatial lag and spatial error effect by using LM and Robust LM test
6. Analysis Spatial Lag Model, Spatial Error Model and SDM model
7. Under spatial Durbin model test LRcom factor test and come up to the reduced model
8. Select the best model by using model selection criteria and test assumptions of residual
4. Result and Discussions
Under descriptive statistics the researcher express all Java Island provinces with the respective number of districts. For DKI Jakarta (5), West Java (21), Banten (7), Central Java (33), DIY Yogyakarta (5) and East Java (34) districts. The rest districts which are not here indicated that they are minimum percentage of poverty rate, the data does not collected by BPS for the explanatory variables and spatially their effect is insignificant if they are geographically far from their neighbor’s jurisdiction. According to Tobler's first Law." Everything is related to everything else, but near things are more related than distant things." Under descriptive statistics concept the researcher also makes a bar chart to identify which province has higher or lowest poverty rate without considering their districts so that this just shows that the overall view of poverty rate in each of Java Island provinces.
From the above bar chart we can see that DKI Jakarta, Banten, West Java, East Java, Central Java and DIY Yogyakarta indicated by 1 up to 6 numbers respectively. So that among the Java Island provinces in 2010 house hold survey there was high percentage of poverty rate in DIY Yogyakarta, Central Java and East Java respectively while in DKI Jakarta is relatively small percentage of poverty rate. In the next step of finding the researcher look the factors that affect poverty rate in Java Island. Why poverty rate is less in DKI Jakarta some researcher found that more rural places are worse in poverty than urban why we shall get it on the outcome. The next step after it is creating weighted matrix with 105 by 105 matrixes after that test the existence of spatial autocorrelation and spatial dependency. If there is no spatial autocorrelation keep our classical linear regression model and give conclusion and recommendation based on it.
As we seen from Moran I test statistics (0.56) this indicates that there is a positive autocorrelation in this poverty data. And the researcher test the significant of autocorrelation by looking p value (2.2e-16) that is very small and less than 0.05 so reject the null hypothesis as stated in the research methods and we conclude that there is a positive spatial autocorrelation in the given poverty data meaning high values of a poverty rate at one locality are associated with high values at neighboring localities or low values of a poverty rate at one locality are associated with low values at neighboring localities since the spatial autocorrelation is positive. In another way Moran’s I (0.56) can be interpreted as the correlation between variable, poverty rate, and the spatial lag (Wy) of poverty rate formed by averaging all the values of poverty rate for the neighboring polygons. Now after the existence of spatial autocorrelation the researcher needs to test spatial dependence, if spatial autocorrelation exist spatial dependence will also exist. First, check the significance of the Lagrange Multiplier (LM) test, which tests for the presence of spatial dependence. If only one is significant, (lag or error), proceed to do that test. If both are significant, check the Robust LM tests, which tests which one could be at work. If only one is significant in Robust test, (lag or error), then do that test. If they are both significant, choose the test with the biggest value. From Lagrange multiplier as we seen all spatial lag and spatial error dependence occur so our model should not be OLS so far we also know that as spatial autocorrelation occur spatial dependence and spatial heterogeneity also occur how ever in the lm test both of the error and the lag model will be appropriate but Anselin stated that we should go further robust lm test so as we can see the error model is no more significant than the lag model in Robust LM test. So from here we can say our best model well be SAR model while latter will see on LR test and AIC as comparison.
Significat Variable | Global vs. Spatial Econometric Model | |||
OLS | SAR | SEM | SDM | |
Intercept | 7.392939 (6.45e-07) | 3.9881e+00 (0.0007603) | 5.9084253 (2.688e-07) | 6.63358480 (0.0016362) |
X1 | -0.025496 (0.018816) | |||
X2 | ||||
X3 | -0.008816 (0.068722) | |||
X4 | ||||
X5 | -0.005189 (0.010631) | (-7.8021e-03) (0.0050391) | -0.0088816 (0.0049987) | -0.01162623 (0.0002693) |
X6 | -0.036373 (0.000136) | |||
X7 | ||||
X8 | -2.2939e-02 (0.0022491) | -0.0273602 (0.0006348) | -0.02032539 (0.0115352) | |
X9 | ||||
X10 | 3.8902e-02 (0.0294126) | 0.0377215 (0.0323737) | 0.03993585 (0.0330746) | |
X11 | ||||
X12 | ||||
Lagged log y ( | 0.51702 (2.2838e-07) | 0.44604 (0.00026544) | ||
Lagged error ( | 0.69407 (2.4727e-07) | |||
Lag x1 | -0.036(0.03) | |||
Lag x4 | -0.035(0.02) | |||
Lag x5 | 0.012(0.03) | |||
Lag x6 | -0.009(0.05) | |||
AIC | 68.614 | 43.838 | 43.991 | 44.858 |
LR test | 23.134 (0.02661) | |||
N | 105 | 105 | 105 | 105 |
RLM | 8.6051 (0.003352) | 0.0015024 (0.96910) |
From the above table we can observe that the more appropriate model for our poverty data is the spatial lag model which has minimum AIC (43.8) even the likelihood ratio common factor test pointed that spatial Durbin model is differ from spatial error model. If spatial Durbin model is differ from spatial error model or it cannot be reduced to spatial error model so that our model pointed to OLS or Spatial Lag Model; in above table as discussed before the OLS result is affected by the presence of spatial dependence and even seen unexpected sign since the spatial effect are significant so that the best model is spatial lag model. So from it as we can see that literate rate and house hold who has higher education is a negative impact on poverty while employer who has more un worked hours per week has a positive impact on poverty. In our lag model the spatial lag effect is significant (0.51702) which mean that on average 100 percent increased in poverty rate in a location resulted in 51.7 percentage point increase in poverty rate in neighbors location and the highest significant of error lag also indicated that a random shocked in spatially omitted variable that affects percentage of poverty rate in a particular location triggers a change in percentage of poverty rate. The next thing is to check our best model to full fills the requirement of assumptions remember the dependent variable was change to log ,all the blank page with respect to each model shows the value of the variable is insignificant and under bracket of above table is shows p value.
From the residuals of the lag model as we seen it, it is enough to say that our model has no problem on normality assumption, From KS test we can also conclude that our model is normal distributed since (KS=0.095238 with p-value=0.7277) indicated that we accept the null hypothesis so that there is no normality problem in our model. For more the researcher also tested the constant variance assumption here the result above from BP test indicated that there is no more heterogeneity problem since the p value is greater than 0.05 we accept the null hypothesis that mean the variance is homogen. Remember as stated before in the research methods our null hypothesis is homoscedasticity against hetroscedastcity. As we seen in the above table OLS result of Durbin Watson (1.56) which indicates that residual are auto correlated so that the OLS model will not accurate since this assumption violated while the SAR model DW=2.24 is greater than du that means there is no problem of autocorrelation or don’t reject the null so that our model is good enough, it is also checked by Moran. Remember multicollinearity also checked by VIF.
5. Conclusion and Recommendation
5.1. Conclusion
Global / Traditional / Classical econometrics has largely ignored spatial dependency between the observations and spatial heterogeneity in the relationship we are modeling, perhaps because they violate the Gauss-Markov assumptions used in regression modeling. With regard to spatial dependency between the observations, recall that Gauss-Markov assumes the explanatory variables are fixed in repeated sampling. Spatial dependence violates this assumption .This gives rise to the need for alternative estimation approaches. Similarly spatial heterogeneity violates the Gauss Markov assumptions that a single linear relationship with constant variance exist across the sample data observations. If the relationship varies as we move across spatial data sample, or the variance change, alternative estimation procedures are needed to successfully model this variation and draw appropriate inference. Based on it between Global and Spatial Econometric model the researcher found that the best model is spatial model in the existence of spatial autocorrelation obviously in case of spatial dependence and heterogeneity to full fill all the required assumptions. Among all models the best one for this poverty data is Spatial lag model.
As we know poverty is a complex phenomenon we cannot determine it within a short period of time if we don’t know the significant determinant factors but if we know the significant factors to reduce poverty so that we can easily fight it. In this research the researcher found that based on the best model (spatial lag model) the literate rate, house hold who have higher degree and employer unworked hours are significant determinate factor of poverty as we see on the output of spatial lag model. The parameter of literate rate is negative which indicated that poverty and literate rate has a negative relationship that mean the more we are educated we can alleviated poverty as well, the more we are illiterate the more we are poor while employer un worked hours are a positive relationship that indicated the more we have un worked hours or spent our working time without doing our activity the more we are poor.
5.2. Recommendation
As individual level the researcher recommend to the house hold of all family member must be increase there working time if they are whatever government employer or private employer so that it can help to generate income and alleviate poverty.
As a government level the researcher recommended that the policy must focused on developing human capacity by increasing literacy rate and education should be free and supported by government until strata one so that educated people can be alleviate poverty in many direction.
Due to limitation of resource the researcher does not cover all the expected factors so that another researcher can be work on it since poverty is the most deprivation in well-being and if you need R-Syntax for the above all models inbox me.
Acknowledgement
Thanks and Praise to the Living triune God who guided, provided and sustained me with wisdom, courage and perseverance throughout this journey.
References