Modeling Burglar Incidents Data Using Generalized and Quasi Poisson Regression Models: A Case Study of Nairobi City County, Kenya

Serious violent crimes including Burglary, dangerous drug trafficking and sexual offenses make up the bulk of incidents filed at police stations daily. These crimes related activities poses a serious threat to the peace and serenity of a nation as far as safety is concerned. Burglar incidents data are often discrete and do not conform to the general assumptions of the linear model and its variants. Ordinarily, such data could be modeled using a linear regression approach to derive the relationship between the response variable to the underlying covariates. However, the narrowing of the gap between city and suburban burglar crime rates brings about variability invalidating the application of Ordinary linear regression approaches. The main objective of this study focused on the comparative use of Generalized Poisson and Quasi-Poisson models as an alternative to the classical linear regression approach in modeling Burglar incidents in Nairobi City County, Kenya. The prime advantage of applying Quasi Poisson in count data analysis is that it fixes the basic fallacy of assuming homogeneity in data and allows estimation of dispersion. The study used secondary data covering Eight (8) Nairobi's Administrative Divisions from the National Crime Research Center (NCRC) for the period 2016-2018. The comparison criteria were the Akaike Information (AIC) Criterion and Deviance Information Criterion (DIC) alongside other model diagnostics tests. Application of this results in burglar events revealed that the number of incidents in the study area are Under-dispersed with the risks of experiencing Burglar crime being above 5% in all the locations surveyed. In an attempt to explore Burglar to location relationship, results from study proved that Generalized Poisson Model performed better than the Quasi Poisson model having posted the lowest AIC value.


Introduction
In Kenya, the National Police Service (NPS) and other security agencies employ location crime mapping to quantify and estimate extent of risk associated with crime. Crime count per location can be used at an indicator of the Nature and degree of crime mitigation measures, can be divided into many classes including Burglar crimes. Burglar crimes in a metropolitan set up are sufficiently regular and significant in occurrence leading to loss of property, murder, kidnapping among others with monetary attachment. Criminal offenses are not only predominant in majority of the urban settlements but also in all societies across the world. The negative impacts such activities on a nation's wellbeing is quite enormous. The most directly affected are the women, children, investors among others who otherwise form an integral part of a nation's peace and serenity. The approach of criminal activities keeps evolving with the modern day technology, thus different methods have been devised in different cases that are distinct and unique making it hard for the policing authorities to correctly identify and tackle the menace. The sum effect of monetary losses and the pain associated to this cause deprives an economy its potential to grow in terms of GDP and the reputation at global level. Nairobi being an Economic hub for East and Central Africa, has attracted high populations both local and foreign for trade, Education and Tourism activities. The general increase in population characterized by high unemployment rate amid increased cost of living is a threat to security [17]. An Economic survey by the Kenya National Bureau of Statistics approximated the number of Kenyans living in slums to be close to 4.0 Million which constitutes 55% of the entire Urban population majority which are in Nairobi City. Figure 1 below shows the 8 locations covered under the study with their respective population density exposed to the risk of Burglar crimes. It is interesting to note that majority of the populations are those in the slums.
Source: National Crime Research Centre. According to an Economic Survey by National Crime Research Centre, Nairobi City police command registered the highest number of all crimes reported in the country with a crime index of 166 representing 8.1% of the total crimes committed. Under this survey, the number of persons reported having committed Burglary and Theft related crimes increased to 6.6% in 2018 compared to 2017 [6]. These worsening trends in crime needs to be constantly researched and appropriate mitigation measures instituted to exterminate its prevalence.
Safe and Secure environment is a pillar in itself towards Government's achievement of the "big four agenda". In the FY2019/2020 Budget, National security allocation costed taxpayers a whooping Kshs. 140.5B in mission to curb Crime rate, burglary included. It is on this backdrop that the study was conducted to predict Nairobi's burglary levels and to get more insight on the topic for the purpose of policing and mitigation measures by the relevant authorities In crime research, ordinary least squares, the Poisson, the Quasi Poisson and functional forms of Negative Binomial have been widely proposed and employed to study and analyze the relationship between crime count and social economic and demographic variables. For instance, Malina, Z & Ahmad, M. used the Functional forms of Negative Binomial Distribution to analyze Car theft cases in Malaysia.
The study aimed at coming up with a predictive model for Malaysia using the data obtained from 10 General Insurance companies and the Royal Malaysia Police. The data consisted of Vehicle type, CC rating, Age and Use. The study found out that Vehicles above 8 years with CC rating above 1800 were more vulnerable to theft cases. Further, the study conducted a Likelihood ratio test for model fitness. NB-P was preferred as the best model fit since it parametrically nested both NB-1 and NB-2 allowing statistical tests on the two models against a more general alternative [1], the study by Janguo. C. et al. used the Geographical Weighted Negative Binomial regression (GWNB) and Geographical Weighted Poisson regression (GWP) for integrative analysis of spatial heterogeneity and Over-dispersion of Crime relative to Location. Model residuals were analyzed using the Deviance information and the study established that the use of GWNB and GWP outperformed the negative binomial regression in reducing spatial dependency on residuals. Upon comparing the model Likelihood ratio estimates, the study established that GWNB was a better fit over GWP in modeling over-dispersed count data. The study further advocated for incorporation of over-dispersion into spatial heterogeneity to improve results in modeling crime related data [5,10]. Various statistical approaches have since been A Case Study of Nairobi City County, Kenya used in quantitative studies to explore the relationship between crime and the influencing factors, such approaches include simple regression models [9]. Whereas the OLS makes mild assumptions about dependent variable being continuous, crime counts including Burglar are discrete in nature invalidating its application since an extrapolation of the line of best fit gives negative values [7]. Therefore, the Poisson family of distributions which include Generalized and Quasi Poisson among others have been adopted for crime modeling alongside Negative Binomial model [14,19]. The standard Poisson is a global technique which assumes that connection between crime and related factors are constant across the space. This is not always true for real case count data [11].
Due to the variations in which crimes occur and the stochastic nature of Burglar activities from one location to another and from one occurrence to another, it is unrealistic to describe the influence of risk factor on Burglar crime as homogeneous. The Quasi Poisson is therefore popular for this reason as it assumes Variance to be some function of mean. This makes its application suitable for both over and under dispersed count data as it clearly estimates and quantifies the extend of dispersion. The Standard Poisson makes a mild assumption of equi-dispersion which is usually violated in real case data [2,3,12,15].
This paper therefore suggests the use of two well-known Poisson model extensions, the Quasi Poisson and the Generalized Poisson modes for the study and analysis of Burglar crime data for Nairobi City, Kenya. This study contributes to the existing literatures in the sense that little is known with regard to location and Burglar crime in Nairobi. The classification of location with respective Burglar indexes is important to the National policing authorities and security agencies. The results of the study are also critical to Insurance industry in coming up with rating factors for home insurance products including Burglar policies by location. The advantage of invoking the GP in the study is that it parametrically nests both Quasi Poisson and Standard Poisson making statistical tests for QP and SP easier against a more general alternative. Quasi Poisson does not give AIC for model evaluation simply because there does not exist such regression model but rather a mean-variance relationship [13].
The GPRM model proposed by Consul, P. et al [8] was used to model count data affected by a number of variables, that is, the response variable was expressed as a linear log function of the variables where variation between mean and variance was explicitly studied. This application has since gained popularity extending to household fertility modeling [16], analysis of injury data at emergency facility among others. The wide use of this model is attributed to its ability to handle dispersion in data.
In order to assess the adequacy of count data models fitted, the study employed several goodness of measure tests which include; Cameron and Trivedi's test for dispersion, Pearson chi-squares, deviance and Hosmer-Lemeshow test. These goodness-of-fit tests forms the basis for evaluating Model performance and comparison to achieve the best model among the three competing approaches by the study.

Standard Poisson Regression Model
Letting , , … , be a vector of random variables where is the number of Burglar occurrences that has a Poisson distribution and be the sample size, then has a probability mass function defined by; The model has its mean and variance expressed as " = #$% = presenting an Equi-dispersion assumption. To incorporate the locations over which Burglar counts were recorded into the model while ensuring non-negative values, mean of the fitted data is assumed to follow a log-link written as; represents location effect and Β is a vector of regression parameters. The MLE of Β can be estimated by maximizing the log-likelihood function 3 .

Quasi Poisson Model
The distinguishing feature between the Standard Poisson and Quasi Poisson affects nothing other than the model standard errors. Mean function remains the same except variance which is assumed to be some function of mean, simply expressed as; This model unlike the Standard Poisson does not give a probability distribution function hence its parameters cannot be estimated by use of MLE. The application of MLE yields Quasi Maximum Likelihood estimators that are identical to usual MLE's [4,18]. It is for this reason that Likelihood for the model doesn't exist invalidating the use of AIC model evaluation criteria.

Generalized Poisson Regression Model
Generalized Poisson model is an extension of the Ordinary Poisson otherwise known as the Standard Poisson with an extra dispersion parameter, 7 . The GPR model reduces to Poisson in the limit as 7 → 0. If 7 > 1, the variance is greater than the conditional mean otherwise, if 7 < 1 then mean value exceeds variance (under-dispersion). In real case however as for the Burglar count data, Variance and mean are not equal invalidating model inference based on the estimated standard errors. The probability mass function for the model is expressed as in equation (4) The mean and variance of the model is given as; The parameters 7 and Β can be obtained by maximizing the log-likelihood function using differencing method [8];

Preliminary Data Analysis
In this study, weekly burglar counts were obtained from National Crime Centre covering 8 administration locations in Nairobi was considered. columns. The data presented a left skewed distribution with majority of the points centered to the left of median count. Furthermore, the study incorporated Under-dispersion into the regression models in order to analyze variation from one location to another and from one occurrence to the other. Table 1 below shows the preliminary results. Initially we have a total of (8*52*3=1,248) counts to fit in the models.  Figure 2 below shows the histogram and Density distribution for the Burglar incidents in Nairobi City County which were positive integer values. However, Burglar counts below 10 had the most significant frequency in the data compared to those incidents above 25. The absence of zero's in the data invalidated the application of Zero Inflated count models. Given this data characteristics, the Poisson family of distribution, seemed to be the ideal choice in modeling the real case.  Figure 3 shows how the normal curve failed to perfectly fit the data invalidating the use of linear regression models. This is a clear indication that even though Burglar crimes exist in the society, they are not extreme as such to assume normality. Consequently, the occurrence of one burglar event does not prompt the occurrence of the other confirming that the events are random and independent of each other.  From table 2, the study notes that both Standard Poisson and Quasi Poisson models yield the same parameter estimates and Residual deviance except for standard errors which are significantly different. For under-dispersed case, residual errors in the Standard Poisson were greater than in Quasi-Poisson, that is, the standard Poisson over-estimates errors for under-dispersed data. In practice however, the equi-dispersion notion in Poisson GLM under-estimates the variance in data which is depicted in the output table 2 above. All model coefficients are significant falling below 0.05 across. Model coefficients represents the risk factor/ exposure rate. The chances of an individual experiencing a Burglar event given a unit increase in population for a given location holding other factors constant. Westland registered the highest exposure rate at 7.647% possibly due to its densely populated Kangemi slum characterized by existence of Organized criminal groups. The 1.7011 Intercept estimate represented a General risk factor in the study area, that is, without exploring Burglar by location, a unit increase in Nairobi's population leads to at least 70.11% chances of experiencing a Burglar incident (intercept only model).

Model Parameters
In summary, the presence of Under-dispersion in count data overstates the standard errors as in the Standard Poisson output. The dispersion parameter estimated by the Quasi-Poisson approach revealed that variance was 0.0405 times the mean value (Φ 0.0405). The data had a greater mean compared to Variance presenting an Under-dispersed case.

The Generalized Poisson Model Parameters
From table 3 below, all locations under consideration were found significant to the study since all the P-values were below 5% level of significance. Generally, standard error for the model are close to those obtained under Quasi model. This is a clear indication that errors in Standard Poisson were overstated due to under-dispersion. In both cases however, these coefficients are ideally interpreted in terms of unit change where the difference in the logs of expected count is expected to change by the respective regression coefficient given that other predictor variables are held constant Hosmer Lemeshow test statistic was used to determine how well the models fit the data depending on the difference between the estimated and the observed Burglar counts in the study area. From the test results, all the three models, having recorded values above 0.05, gave a better fit with Generalized Poisson being considered best for having posted the most positive deviation to the right of 5%level of significance.

Pearson's Chi Square and Deviance Test
From table 4 below, Both Deviance and Pearson's Chi square are relatively small and close to each other, that is, 5.5326 and 5.4738 respectively. A good fit is virtually possible given these. The Chi Square statistic is consistently close to the 5% critical value implying the models gave a better fit to the burglar data.

AIC and Deviance
For testing model adequacy of the Generalized Poisson model versus the Standard Poisson, AIC values were used. GPRM posted the lowest AIC qualifying for a better model. Table 5 below gives a summary of the model selection criteria. Quasi Poisson does not give log likelihood nor AIC values since it is not a regression model but rather a mean-variance relationship. Evaluation of this model is therefore based on the Residual Deviance which was found to be less than the Null Deviance.

Model Prediction
From Figure 3, the study notes that counts between 10 and 20 had the most significant frequency when compared to counts above 25. From the predictions, the study notes that there are minimal chances of experiencing a Zero Burglar count in future.
The predicted minimum and maximum counts exceed the observed values of 8 and 30 respectively. This is a clear indication of increased Burglar crimes in the near future compared to the period 2016-2018 when the study was conducted. Table 6 below compares the actual and predicted Burglar crime counts. From the comparison, majority of the predicted counts are centered to the left of the mean mark presenting a left skewed distribution similar to the inference made about Figure 2. This distribution feature defines count data. With this kind of visual presentation for Generalized Poisson, the model gave an almost perfect fit reliable for future planning and forecasting of the Crime levels in Nairobi City County, Kenya.

Conclusion
This study proposed the use of QPR and GPR models as an alternative approach to describe the relationship between Burglar count and location. The study has shown that for Under-dispersed count data, the GPR is better than both Standard and Quasi Poisson models. This is because, the use of ordinary Poisson Regression model assumes that the variance of the observed count data is the same as the mean value, that is, the method does not allow estimation of the dispersion in its analysis. This assumption, more than often, is violated by the real case data sets leading to overestimation or underestimation of standard errors and consequently wrong inferences.
Owing to the weaknesses of the Standard Poisson Regression model, there was need to use other robust count data models such as Negative Binomial, Quasi Poisson and Generalized Poisson among others that could easily accommodate such variations in data. These approaches assume Variance to be some function of the mean.
The Quasi Poisson and the Generalized Poisson were two such models that the study employed in modeling real case count data. The study of the relationship through application of Generalized and Quasi Poisson models between the Burglar count (risk) and location could be used as an overview to residents and local administration in addressing this crime. Publication of the study findings would create awareness on location risks and help reduce the crime through community policing. For academics, by fitting Poisson family of distribution to the data led to exploration of more complex models for under-dispersed data such as Quasi and Generalized Poisson.
From the model results and diagnostics tests, the study found Generalized Poisson ideal for Burglar incidents data in Nairobi especially for count data characterized by Under-dispersion cases. The model posted the lowest AIC and had the best visual predictions over the rest. Even though Quasi Poisson was comparable to the later, its predictions were unreliable as there was no likelihood or AIC to evaluate its adequacy. The conclusion on Generalized Poisson by the study are similar to a study by Tammy, H. which found the GPRM apart from being robust to variations was ideal for Under-dispersed ecological data applied in garden case for the number of wilted plants [3].

Significance of the Study
This study was important to the City's economic, social and political stability as it will provide guide to the police departments in making decision on operational matters, that is, give a guiding frame on police to citizen ratio and possible locations to beef up security. It will also provide insightful information to Insurance Companies underwriting Burglar class on cover a gives Burglar risk by location which is a key rating factor.

Recommendations
This research paper is a milestone achievement towards comprehensive analysis of residential Burglar risk by location in Nairobi City County, Kenya. The study believes that count data should not be transformed, but instead, Poisson model and its extensions be applied in such cases. The results of this study provides ample suggestions for future research as noted below; First, it will be necessary to carry a related study with a prime objective of finding Regression models for crime risk as a whole such that the Relationship between different types of crime can be explored fully. Location classification with help of Geo-spatial information (GIS) be considered to identically and uniquely map various location into their respective Crime indexes.
Lastly, the study recommends that future studies to consider other means of Poisson generated populations, other under-dispersion parameters and other proportion of zeros. The Research should also be extended to incorporate random effects (REH) and more covariates including latent variables. A Case Study of Nairobi City County, Kenya me their experiences in research based writing. I am greatly indebted to their enthusiasm, support and guidance throughout the journey.
Lastly, I would like to sincerely thank my Mother, Julia Muchika, and the entire family with utmost gratitude for their absolute financial and moral support on everything.