Modelling of Gully Erosion Site Data in Southeastern Nigeria, Using Poisson and Negative Binomial Regression Models

The development of gully and other forms of erosion have become the greatest environmental problem facing the people of Southeastern Nigeria. The availability of farm land for agricultural production and construction activities have, been greatly reduced due to soil erosion. This study is set to apply Poisson and negative binomial regression models to identify the major factors that contribute to gully erosion development in Southeastern Nigeria and to ascertain better model suitable for prediction of gully erosion, using secondary data. Maximum likelihood estimation procedure was used to estimate the parameter of the selected model with the number of gully erosion sites as the response variable (Y) and 5-explanatory variable (X’s). Also applying the forward selection criteria to the 5-explanatory variables, model 5 is best suitable for forecasting the subject under study. The result of the Poisson regression model showed that there was over dispersion in gully erosion site data since the dispersion parameter (3.677) was greater than 1 hence underestimating the standard error and over estimating the coefficient of the explanatory variable, consequently giving misleading inference. The result of the assessment criteria for Poisson regression model and Negative binomial regression model revealed that the Negative binomial regression model predicts gully erosion soil data better in southeastern Nigeria as considered in this study. Heavy Rainfall (HRF), Extractive Industries (EXI), Excess Farm activities (EFX) are the major contributors to gully erosion site development in southeastern Nigeria, with Heavy Rainfall ranking first. A model suitable for prediction of gully erosion sites in southeastern Nigeria has been developed.


Introduction
Soil erosion is considered to be a major environmental problem since it seriously threatens natural resources and the environment [1]. Soil loss by runoff is a severe ecological problem occupying 56% of the worldwide area and is accelerated by human induced soil degradation [2]. Soil erosion is a serious environmental, economic and social problem which not only causes severe land degradation and soil productivity loss but also threaten the stability and health of society in general and sustainable development of rural areas in particular [3].
The menace of soil erosion especially gully in no doubt represents a major ecological challenge facing most states in Nigeria especially Anambra, Imo, Ebonyi, Abia and other states in the humid tropical regions of southern Nigeria [4]. Soil of southern Nigeria has high soil erodibility and is classed as structurally unstable Ofomata [5]. According to [6], the presence of gully site is one of the hazardous features that characterize the Southeastern zone. Gully erosion have become one of the greatest environmental disasters facing many towns and villages in Southeastern Nigeria [7,8].
A conservative assessment shows the distribution of known gully sites in different stages of development as follows Abia (300), Anambra (700), Ebonyi (250), Enugu (600) and Imo (400). The statistics are not exhaustive enough as new sites are developing during each rainy season due to flooding and torrential rainfall [9,10]. The average depth of gullies existing in Ideato North and South L. G. A ranges between 15-35 metres, with a cross sectional area of about 800 metres in some places and covering a distance of about 3km [11]. This paper therefore presents the application of Poisson and negative binomial regression model that predicts gully erosion sites data in southeastern Nigeria to determine the soil erosion parameters that contribute more to gully erosion menace in the region.
A brief review of the causes of gully erosion in Southeastern Nigeria.
From field observation, productive works and reports of other researches carried out within the region, the major causes of gully erosion are as follows: (1) Climatic factors: FAO and Okoroafor concluded that gully erosion results from the action of heavy rainfall on surface earth materials under reduced or altered vegetative cover [12,13]. According to Igwe, the rainfall of southern Nigeria is heavy and aggressive thus generating large volume of runoff that initiate the development of waterways and channels that result to gully [14]. (2) Soil Nature and Topography: The Southeastern Nigeria is susceptible to gully erosion due to the nature of the soil, topography and geology [15 -18]. The Imo/Anambra basin is predominated by the Akwa-orlu cuesta which is an area susceptible to ground surface cracks, landslides, mass movement and tectonic movements during the rainy season that results to all kinds of land degradation and soil erosion predominantly [19,20]. (3) Human Factors: According to Egede soil/land has been subjected to intensive pressure from human uses that induced degradation, soil loss and erosion. Such human factors include overgrazing, excessive farm activities, clearing of bushes, extractive industries, road construction, overpopulation, lumbering, residential buildings, development of urban Centre, poor drainage networks etc [21]. Uncontrolled mining operation and soil excavation for various developmental project in the southeast contribute also to gully erosion in the region. The objective of this study is to provide a model which will be able to predict the major contributors of gully erosion site in southeastern Nigeria.

Poisson Regression Model
By definition, Y (dependent variable) follows a Poisson distribution with parameter λ 0 iff the probability distribution function is given by For K=0, 1, 2,… such that E Y λ and Var Y λ For n independent random variables Y , Y , … , Y Y ~P μ , and suppose we want to let the mean and the variance depend on the explanatory variables x 's We can consider a generalized linear model with log link as: Where X denotes the vector of explanatory variables and β denotes the vector of regression parameters.

Model Specification
The model for the Poisson regression is given as: Where α is the intercept, Since the mean is equal to the variance the usual assumption of Homoscedasticity would not be appropriate for a Poisson data.

Parameter Estimation-Iterative Reweighted Least Squares
Estimation of parameters in Poisson regression relies on maximum likelihood estimation (MLE) method.
The log likelihood function is given as: To obtain the maximum likelihood for the parameter β 0 we employ the chain rule Hence the iterative equation would be written as: Where w = FGH(I = ) and z = y

Negative Binomial Regression (NBR) Model
NBR is a popular generalization of Poisson regression because it loosens the highly restrictive assumption that the variance is equal to the mean made by the Poisson model. The traditional negative binomial regression model, commonly known as NB2, is based on the Poisson-gamma mixture distribution. This model is popular because it models the Poisson heterogeneity with a gamma distribution. This is given as: Y ~NB(λ = exp(X C β) , ψ) Compounding Poisson distribution above and a gamma distribution would give a Negative Binomial distribution (1) The regression coefficients are estimated using the method of maximum likelihood. (2) The significant regression parameters from (11) will be subjected to a test of hypothesis. (3) The AIC would reveal the better model between equation (4) and equation (11) then the rank test would be employed to ascertain the levels of contribution of the various factors under consideration to gully erosion in south eastern Nigeria.

Model Evaluation
In this research work, we would be applying the techniques below for evaluating the models to also ascertain the better model suitable for modelling gully erosion site data in South-eastern Nigeria.
Deviance: is a goodness of fit statistic for a model that is often used for statistical hypothesis testing and often used to compare two different models. Larger deviance indicates low performance of the model and less deviance indicates better performance of the model. D = 2(l(y, ϕ; y) − l(μ ], ϕ; y)) Where l(y, ϕ; y) is the log-likelihood of the full model and l(μ ], ϕ; y)is the log-likelihood of the current model. AIC: The Akaike would be used to in selecting the best model that fit our data. The model with the smaller AIC is the best.
Where n is sample size and K is number of predictors; Pearson Residual: It is used to check for model fit of each variable (explanatory variables). It is the discrepancy between our observed and fitted values for each observation.  In modelling the number of gully sites in the Southeastern Nigeria using Owerri, Abia, Enugu as a pivot point for the work, R statistical software version 3.3.3 was used. The Generalized Linear Model (GLM) with Poisson as the fundamental distribution for modelling a count data using the Log link function and the Negative Binomial distribution was latter employed to correct the error of over dispersion in the count data in situation where the result of the Poisson regression model shows over dispersion.  The table 3 presents the parameter estimates of the selected model for the gully erosion in southeastern Nigeria at 5 different states. The Akaike information criterion (AIC) of this model was 142.5 with a null deviances of 622.402 on14 degree of freedom and a residual deviance of33.094 on 9 degrees of freedom following the chi-square distribution ( 2 χ ) on one degree of freedom. The dispersion parameter was found to be 3.667(i.e residual deviance/degree of freedom as seen in table 5) and the Omnibus test flag a Pvalue equal 0.000 which implies that the model is significant at 5% α -level. However, the assumption of equality of mean and variances in Poisson distribution has been violated since the dispersion parameter is not approximately equal to 1. The dispersion parameter of the above model is 3.667 which is greater than 1, a clear indication of over dispersion in the gully erosion data. This further implies that the parameters of the stated model have been over-estimated and the corresponding standard errors have been under estimated consequently giving a misleading inference about the regression parameters. To address this issue, Negative Binomial regression was used to modify the model to nullify the effect of over dispersion in the data and the result is shown in table 3 below.  From table 4 it can be seen that Heavy Rainfall, Nature of the Soil, Excess Farm activities and Extractive Industrieswere all statistically significant since (p-value < 0.05) while Topography was not statistically significant since (p-value > 0.05).

Interpretation of Coefficients
The variables Heavy Rainfall (HRF), Nature of the Soil (NOS), Excess Farm activities (EFX) and Extractive Industries (EXI) were all statistically significant at 5% α level. This implies that Gully Erosion in the Southeastern region of Nigeria is not as a result of only poor drainage, but from this study we see all the above listed factors contributing significantly to the increasing numbers of deadly gully erosion as seen or documented within the southeastern region of Nigeria. From this study, it can be established that Heavy Rainfall, Nature of the Soil, Excess Farm activities and Extractive Industries are they major contributors to gully erosion in the southeastern region of Nigeria.
From Table 4 it is observed that the parameter estimates have been reduces and the standard errors have also been increased. The parametric analysis for the comparison between the Poisson and Negative Binomial Regression for goodness of fit of the model is shown in table 5.

The Model for Predicting No of Gully Erosion Sites in Southeastern Region of Nigeria
For negative binomial regression, the model for the number of gully erosion data is obtained as:  From table 6 which ranks the P-values of the better model (Negative binomial Regression) to identify which parameter was more significant also which factor contributed more than the other. Heavy Rainfall was rank in the first position. Implication is that Heavy Rainfall factor is the most statistically significant factor that contributes to high number gully erosion in the southeastern Nigeria followed by Extractive Industries, Excess farm Activities, Nature of the Soil while Topography was not a significant contributor to the subject matter in this study.

Conclusion
The extent of soil erosion occurring in the study area is still increasing and is now a major course for concern. From field observation, productive works and reports of other researchers carried out within the region, the major causes of gull erosion stem from climatic factors, soil nature, topography and human factors. The generalized linear model (GLM) with poison as the fundamental distribution for modelling a count data using the log link function and the negative binomial distribution was later employed to correct the error of over dispersion in the count data in situation where the result of the poisson regression model shows over dispersion. The negative binomial regression model is therefore the better model suitable for modeling gully erosion in the southeastern regions of Nigeria as considered in this study. Heavy Rainfall (HRF), Nature of the Soil (NOS), Excess Farm activities (EFX) and Extractive Industries (EXI) are the major factors that contribute to the high number of gully erosion in southeastern Nigeria. The model which can be used to forecast future number of gully erosion in the region keeping the factors considered in this study constant is shown below: