American Journal of Theoretical and Applied Statistics
Volume 6, Issue 1, January 2017, Pages: 38-43

Principal Component Analysis of Crime Data in Gwagwalada Area Command, Abuja from 1995 – 2015

Nasiru Mukaila Olakorede, Samuel Olorunfemi Adams, Samuel Olayemi Olanrewaju

Department of Statistics, Faculty of Science, University of Abuja, Abuja, Nigeria

Email address:

(N. M. Olakorede)
(S. O. Adams)

To cite this article:

Nasiru Mukaila Olakorede, Samuel Olorunfemi Adams, Samuel Olayemi Olanrewaju. Principal Component Analysis of Crime Data in Gwagwalada Area Command, Abuja from 1995 – 2015. American Journal of Theoretical and Applied Statistics. Vol. 6, No. 1, 2017, pp. 38-43. doi: 10.11648/j.ajtas.20170601.15

Received: July 15, 2016; Accepted: July 25, 2016; Published: February 6, 2017


Abstract: This paper analyses Abuja crime data which consists of the averages of twenty major crimes reported to the police for the period 1995 – 2015. Correlation analysis and principal component analysis (PCA) were employed to explain the correlation between the crimes and to determine the distribution of the crimes over the three Area Councils under the Gwagwalada Area Command. The result has shown a significant correlation between robbery and rape, grievous hurt and wound (GHW), theft, assault, murder and unlawful escape. Gwagwalada Area Council has the highest overall crime rate i.e. Rape, Robbery, GHW, Theft, Assault, Murder and unlawful escape in the Area Command. Unlawful possession, breach of peace and broken store are more prevalence in Kwali Area Council while Vehicle theft, car stealing and burglary are more prevalence in Kuje Area Council Area. The PCA has suggested retaining three components (Rape, Robbery and GHW) that explain about 86.873 percent of the total variability of the data set.

Keywords: Multivariate Analysis, Factor Analysis, PCA, GHW, DPHs


1. Introduction

In recent time there has been an increase in the reported cases of crime across the country consequently in the FCT Abuja, thus raising the issue of what categories of crime is committed in the federal capital territory. The scope of crime prevention has grown considerably in the last few years. What was previously the sole concern of the police and the private security industry has spread into areas from real estate developer, car manufacturers, residents' groups, building public facilities like society offices and shopping centers, all these calls for continuously using improved new ways to prevent crime. Criminal victimization has serious consequences for the citizens and society because high standard of living is undermined by high level of criminal victimization [1]. The issue of crime in the city of Abuja has increase since the relocation of the federal capital to Abuja, the statistical analysis of the crime rate in Abuja will be explain and analyzed as Crime is one of the continuous problems that bedevil the existence of mankind. Principal component analysis (PCA) is very useful in crime analysis because of its robustness in data reduction and in determining the overall criminality in a given geographical area. PCA is a data analysis tool that is usually used to reduce the dimensionality (number of variable) of a large number of interrelated variables while retaining as much of the information (variation) as possible.

Literature has shown that previous research on crime data were analyzed using Principal Component Analysis (PCA) [3] Classified a city as safe or unsafe in the US Cities by using multivariate methods of principal components, factor analysis, and discriminant analysis to reduce the 14 distinct variables that can affect the crime rate of a city to 6 and 7 important variables that show a high correlation with all the variables. [11] Principal component analysis did decrease the number of variables to 6 and accounted for 86% of the total variance, while factor analysis decreased the number to 7 and accounted for 79.7% of the total variance. [5] Analyzed Oyo State crime data which consists of 8 major crime reported to the police between the period of 1996 – 2014. They employed correlation analysis and principal component analysis to explain the correlation between the crimes in the state. Their results showed a significant correlation, the principal component analysis has suggested retaining 6 components that explained about 83.79% of the total variability of the data set. [10], in their research employed the method of face-to-face personal interview using a stratified multistage random selection procedure. They applied PCA to analyze the spatial pattern of criminal victimization in the 11 Local Government Areas in Kastina Senatorial Zone. They found out that Batsari has the overall highest average victimization while Rimi has the lowest average victimization in the zone. [8], used katsina state data which consist of the average of eight major crimes reported to the police for the period 2006 – 2008. Correlation analysis and principal component analysis (PCA) were employed to explain the correlation between the crimes and to determine the distribution of the crimes over the local government of the state. The result has shown a significant correlation between robbery, theft and vehicle theft. The PCA has suggested retaining four components that explain about 78.94 percent of the total variability of the data set.

This paper explores the use of correlation analysis and PCA for effective crime control and prevention. The objectives of this research include; to determine the bivariate association existing between pairs of crimes types, to identify the crime that is dominant in FCT Abuja and to reduce the dimension of the data using Principal Component Analysis (PCA).

PCA offers a tool for reducing the dimensionality of a very large data set and in determining the areas with overall crime rate. These if properly implemented, will successively solve many of the complex criminal problems that have bedeviled the country in general and Abuja in particular.

2. Data Description

The crime data for the period 1995 – 2015 for the 3 Divisional Police Headquarters (DPHs) in the Gwagwalada Area Command Abuja was collected officially from the FCT Police Headquarters, Abuja. For easy statistical analysis and interpretation, the 3 Divisional Police Headquarters (DPHs) were categorized according to the three Area Councils (AC) under Gwagwalada Area Command i.e. Gwagwalada, Kuje and Kwali Area Council. The data consists of thirteen major crimes reported to the police within the period 1995 – 2015. The crime include; Rape, robbery, Grievous Hurt and Wound (GHW), theft, Vehicle theft, assault, murder, Burglary car stealing, unlawful possession, breach of peace, broken store and unlawful escape. Frequencies of crimes for each category were averaged over the twenty-one years in the study period to control for anomalous years when there may have been an unexplained spike or fall in crime levels prior to the statistical analysis. The value for each crime was converted to crime rate per 100,000 populations of the Area Council (AC) which was calculated as [4].

3. Methodology

3.1. Principal Component Analysis

Principal component analysis (PCA) is one of the most frequently used multivariate data analysis and it can be consider as a projection method which project observations from a p-dimensional space with p-variable to a k-dimensional space (where K<P) so as to conserve the maximum amount of information (information is measured here through the total variance of the scatter plots) from the initial dimensions. This method transforms a set of highly correlated variable into the new sets which are uncorrelated but contain almost all information in the original data. The new set of variables are often fewer in number than original variables. The features of PCA are; it incorporate no error terms in its structure and hence it is a mathematical procedure, It transforms a set of variable into a new set of uncorrelated variables, It is appropriate when variables are on equal footing.

3.2. Derivation of PCA

Given a P dimensional random vector  with mean vector  and dispersion matrix ∑, PCA seeks a new set variable  (P often fewer than P) so that:

(1)

Where;  are coefficients; Thus; Zj’s are linear combination with coefficient  and

(2)

And (3)

The procedures try to obtain aij so that Zj, the jth PC of X will have the following properties

(i)      The Z’s are orthogonal

(ii)     Each Z capture the maximum variable remaining in X, hence we maximize the variation in Z subject to the constraint

For instance if Z, is the first PC we seek for a1, such that

(i)       is a maximum

(ii)    

If Z2 is the second PC after determining Z1, and it’s uncorrelated with Z1, we seek for a2 such that;

(i)       is a maximum

(ii)    

(iii)    that is, Z1 and Z2 are orthogonal

The procedure continue in this way to select the PC such that;

(i)      is a maximum

(ii)    

(iii)   i.e., Zj’s are orthogonal

3.3. The Procedure of Finding the First pc

To find the first PC, we seek for a1, such that

(4)

is the PC of X subjected to the constant.

(i)     

(ii)    

To maximize var (Z1) subject to a11a1, we define the langrangian function.

(5)

λ is the langrangian multiplier.

To maximize L (a1) we differentiate L (a1) partially with respect to a1 and equate the result to zero Thus,

(6)

And  (7)

Where: λ = Eigen vector of ∑a1 is the corresponding Eigen vector of λ

The solution a1 = 0 is a trivial solution and since a1 cannot be zero (i.e. a1 ≠ 0) to have non-trivial solution then

(8)

and implies that

(9)

If (3.7) is to have non-trivial solution, then because of (3.8), λ must be the characteristics root of ∑.

Hence we will have P characteristic root and P ai’s which are vector since ∑ is a P x P dimensional matrix lets λ1, λ2 , λp be the characteristic roots of ∑, then var (Z1) = a11λa1 = λ hencemax var (Z1), is equivalent to max (λ). That is, if we have Pλ’s we choose the maximum and Var (Z1) = λi.

3.4. The Procedure for Finding the Second PC

Let Z2 be the second PC of X, then .

We seek a2 such that;

(i)     

(ii)    

(iii)  

The langrangian function is

(10)

Thus;

(11)

Multiply by to get;

(12)

3.5. Interpretation of the Principal Components

The loading or the eigenvector αj 1, α2, … αp, is the measure of the importance of a measured variable for a given PC. When all elements are positive, the first component is a weighted average of the variables and is sometimes referred to as measure of overall crime rate. Likewise, the positive and negative coefficients in subsequent components may be regarded as type of crime components [7] and [6]. The plot of the first two or three loadings against each other enhances visual interpretation [9].

The score is a measure of the importance of a PC for an observation. The new PC observations Yij are obtained simply by substituting the original variables Xij into the set of the first PCs. This gives

Yijj1Xi1 + αj2Xi2 + ….+ αjpXip           (13)

i=1,2,…., n, j=1,2,…, p

3.6. The Proportion of Variance

The proportion of variance tells us the PC that best explained the original variables. A measure of how well the first q PCs of Z explain the variation is given by:

A cumulative proportion of explained variance is a useful criterion for determining the number of components to be retained in the analysis. A Scree plot provides a good graphical representation of the ability of the PCs to explain the variation in the data [2].

4. Analysis and Results

In this section, we present the results of the analysis of the data on the total number of crime committed yearly from 1995 through 2015 in Gwagwalada Area Command, Abuja, Nigeria. The data was collected from the police headquarters, Abuja.

Table 1. Correlation matrix of Crime Types (per 10,000 Population) in Gwagwalada, Abuja.

    Rape Robbery GHW Theft V. Theft Assault Murder
Rape Pearson Correlation 1 .883** .930** .909** .304 .912** .857**
Sig. (2-tailed)   .000 .000 .000 .193 .000 .000
Robbery Pearson Correlation .883** 1 .965** .925** .094 .927** .767**
Sig. (2-tailed) .000   .000 .000 .693 .000 .000
GHW Pearson Correlation .930** .965** 1 .980** .227 .980** .822**
Sig. (2-tailed) .000 .000   .000 .337 .000 .000
Theft Pearson Correlation .909** .925** .980** 1 .312 .998** .803**
Sig. (2-tailed) .000 .000 .000   .181 .000 .000
V. Theft Pearson Correlation .304 .094 .227 .312 1 .298 .249
Sig. (2-tailed) .193 .693 .337 .181   .202 .290
Assault Pearson Correlation .912** .927** .980** .998** .298 1 .808**
Sig. (2-tailed) .000 .000 .000 .000 .202   .000
Murder Pearson Correlation .857** .767** .822** .803** .249 .808** 1
Sig. (2-tailed) .000 .000 .000 .000 .290 .000  
Burglary Pearson Correlation -.046 -.089 -.028 -.016 .083 -.021 .052
Sig. (2-tailed) .846 .708 .908 .945 .729 .928 .829
C/Stealing Pearson Correlation .306 .101 .231 .310 .998** .296 .237
Sig. (2-tailed) .189 .671 .327 .184 .000 .205 .315
Unlawful possession Pearson Correlation .055 .314 .321 .322 -.013 .314 .096
Sig. (2-tailed) .818 .177 .167 .166 .957 .177 .687
Breach of Public Peace Pearson Correlation -.007 .129 .206 .222 .070 .210 .076
Sig. (2-tailed) .978 .588 .384 .346 .769 .374 .751
Broken Store Pearson Correlation -.011 .054 .116 .138 .280 .111 -.078
Sig. (2-tailed) .962 .822 .627 .561 .232 .642 .745
Unlawful Escape Pearson Correlation .837** .985** .922** .883** .115 .882** .738**
Sig. (2-tailed) .000 .000 .000 .000 .628 .000 .000

Table 1. Continue.

    Burglary C/ Stealing Unlawful possession Breach of Public Peace Broken Store Unlawful Escape
Rape Pearson Correlation -.046 .306 .055 -.007 -.011 .837**
Sig. (2-tailed) .846 .189 .818 .978 .962 .000
Robbery Pearson Correlation -.089 .101 .314 .129 .054 .985**
Sig. (2-tailed) .708 .671 .177 .588 .822 .000
GHW Pearson Correlation -.028 .231 .321 .206 .116 .922**
Sig. (2-tailed) .908 .327 .167 .384 .627 .000
Theft Pearson Correlation -.016 .310 .322 .222 .138 .883**
Sig. (2-tailed) .945 .184 .166 .346 .561 .000
V. Theft Pearson Correlation .083 .998** -.013 .070 .280 .115
Sig. (2-tailed) .729 .000 .957 .769 .232 .628
Assault Pearson Correlation -.021 .296 .314 .210 .111 .882**
Sig. (2-tailed) .928 .205 .177 .374 .642 .000
Murder Pearson Correlation .052 .237 .096 .076 -.078 .738**
Sig. (2-tailed) .829 .315 .687 .751 .745 .000
Burglary Pearson Correlation 1 .071 .226 .222 .119 -.094
Sig. (2-tailed)   .767 .338 .346 .616 .692
C/Stealing Pearson Correlation .071 1 -.006 .080 .297 .119
Sig. (2-tailed) .767   .982 .736 .204 .617
Unlawful possession Pearson Correlation .226 -.006 1 .892** .744** .310
Sig. (2-tailed) .338 .982   .000 .000 .183
Breach of Public Peace Pearson Correlation .222 .080 .892** 1 .849** .090
Sig. (2-tailed) .346 .736 .000   .000 .707
Broken Store Pearson Correlation .119 .297 .744** .849** 1 .054
Sig. (2-tailed) .616 .204 .000 .000   .821
Unlawful Escape Pearson Correlation -.094 .119 .310 .090 .054 1
Sig. (2-tailed) .692 .617 .183 .707 .821  

**Correlation is significant at the 0.01 level (2-tailed).

Source: Derived from Statistics Department of the Nigeria Police Force, Abuja

The correlation matrix in Table 1 displayed different levels of correlation between the crimes. There is strong positive relationship in between robbery, rape, grievous hurt and wound (GHW), theft, assault, murder, and unlawful escape, unlawful possession and breach of public peace, c/stealing and Vehicle theft. The relationships were also significant at 5% significance level, which means that their variables can be used to predict (explain) one another. Similarly, it was observed that the correlations between robbery and burglary, breach of public peace and broken store, rape and burglary, GHW and burglary, theft and burglary, assault and burglary, murder and broken store, c/stealing and unlawful possession were negative and insignificant.

Table 2. KMO and Bartlett’s test.

Kaiser-Meyer-Olkin Measure of Sampling Adequacy 0.594
Bartlett’s Test of Sphericity Approx. Chi-Square 472.314
df 78
Sig. 0

The null hypothesis that the correlation matrix is an identity matrix was rejected at 5% level of significance (Bartlett's test of Sphericity; χ2 = 472.314, p-value =.000), this implies that the correlation in the dataset are appropriate for factor analysis. Also, "Kaiser-Meyer-Olkin statistic = 0.594" revealed that adequate sampling is being used for this analysis.

Table 3. Communalities Initial Extraction.

Crime Initial Extraction
Rape 1.000 .992
Robbery 1.000 .995
GHW 1.000 .991
Theft 1.000 .965
Vehicle theft 1.000 .985
Assault 1.000 .964
Murder 1.000 .762
Burglary 1.000 .114
C/Stealing 1.000 .989
Unlawful possession 1.000 .929
Breach of Public Peace 1.000 .937
Broken Store 1.000 .862
Unlawful Escape 1.000 .902

Extraction Method: Principal Component Analysis.

From table 3, we see that Rape, Robbery and GHW with (.992, .995 and .991) respectively, were best represented in the common factor space, this was because a high proportion of their variances was explained by the principal components.

Fig. 1. Scree Plot.

Table 4. Eigenvalues.

Crime Initial Eigenvalues
Eigenvalues Proportion Cumulative
Rape 6.663 51.251 51.251
Robbery 2.686 20.664 71.916
GHW 1.944 14.957 86.873
Theft .966 7.433 94.306
Vehicle theft .314 2.415 96.720
Assault .192 1.478 98.198
Murder .155 1.196 99.394
Burglary .037 .288 99.681
C/Stealing .029 .222 99.904
Unlawful possession .010 .073 99.977
Breach of Public Peace .002 .013 99.990
Broken Store .001 .008 99.998
Unlawful Escape .000 .002 100.000

The eigenvalues, proportion and the cumulative proportions of the explained variance are displayed in Table 4. Considering the eigenvalue-one criterion and the Scree plot in figure 1, it would be reasonable to retain the first three PCs i.e. Rape, Robbery and GHW. A commonly accepted rule says that it suffices to keep only PCs with eigenvalues larger than 1, so the first 3 PCs can be retain to explain up to 86.873 percent of the total variability.

Table 5. Eigen vectors: Component Factors Estimates.

Crime Principal Component
1 2 3
Rape .928 -.252 .070
Robbery .946 -.140 -.223
GHW .986 -.069 -.097
Theft .982 -.034 -.014
Vehicle theft .338 .209 .913
Assault .980 -.053 -.024
Murder .846 -.212 .024
Burglary -.004 .332 .795
Car Stealing .339 .219 .909
Unlawful possession .355 .822 -.357
Breach of Public Peace .250 .909 -.219
Broken Store .186 .902 .042
Unlawful Escape .918 -.143 -.199

Extraction Method: Principal Component Analysis.

a. 3 components extracted.

This table 5 above; concentrated on the three PCs that explains 86.873 per cent of the total variability of the data set are retained.

Component 1: (Gwagwalada) has a positive and negative relationship with all the crimes recorded but majorly it identify, Rape, Robbery, GHW, Theft, Assault, Murder and unlawful escape.

Component 2: (Kwali) Unlawful possession, breach of peace and broken store as the major crimes committed. It has a negative relationship (decrease) with Rape, robbery, Grievous Hurt and Wound (GHW), theft, assault, murder.

Component 3: (Kuje) identify vehicle theft, car stealing and burglary as the concentrated offence. This implies that, Vehicle theft, car stealing and burglary are the daily offences in Kuje.

Fig. 2. Score plot and loading of the first and second Principal Component (PC).

In figure 2 above, the Area Council at the upper side show tendency towards rape, G. H. W., robbery, Assault, murder, unlawful escape and theft therefore Gwagwalada Area Council has the highest prevalence for these crimes. Kwali Area Council shows tendency towards breach of peace, broken store and unlawful possession are located at the lower part of figure 2 while Kuje Area Council located at the right side shows high tendency toward; vehicle theft, car stealing and burglary. The score plot classified the crimes into groups: (1) the concentrated crimes consisting of rape, GHW, car stealing, robbery, murder and vehicle theft and (2) the less concentrated consisting of Assault, burglary, breach of peace and unlawful possession.

5. Conclusion

The following are the conclusions deduced from the analysis. There is strong positive relationship between robbery and rape, grievous hurt and wound (GHW), theft, assault, murder and unlawful escape, the relationship were also significant, which means that their variables can be used to predict (explain) one another. It was also observed that the correlations in between robbery and burglary, breach of public peace and broken store were negative and insignificant. The Area Council with the highest crime rate is Gwagwalada, Kuje has a moderate crime rate, while Kwali Area Council have the lowest crime rate in Abuja. Three PCs (Rape, Robbery and GHW) that explains about 86.873 per cent of the total variability of the data set are suggested to be retained.

The score plot has classified the crimes into two, namely, (1) concentrated offences: Rape, G. H. W., vehicle theft, theft, Car stealing, robbery, Murder (2) less concentrated: Assault, burglary, burglary, breach of peace and unlawful possession. Base on this, the component has geographically divided Gwagwalada Area Command between the north and south in relation to the crime classifications. The southern parts of A Contain more assault, breach of peace, theft, vehicle theft, while the northern part has the prevalence of the concentrated crimes like rape, GHW, robbery, murder, car stealing. This will help in identifying the distribution of crimes in Gwagwalada, Abuja, allowing the investors to measure the level of risk and to plan how preventive measures for safeguarding their investments.


References

  1. Alemika E. E. O and I. C. Chukwuma (2005), Designing indicators of safety and justice: lesson from the clean foundation’s National Crime Victims Surveys in Nigeria.
  2. Cattell, R. B. (1966), The Scree test for the number of factors. Multivariate Behavioral Research, 1, 245-276.
  3. Kendall Williams et al (2004), A Multivariate Statistical Analysis of Crime Rate in US Cities.
  4. Kpedekpo, G. M. C. and Arya, P. L. (1981), Social and Economic Statistics for Africa. George Allen and Unwin, London.
  5. Olufolabo O. O., Akintande O. J., Ekum M. I. (2015), Analyzing the Distribution of Crimes in Oyo State Using Principal Component Analysis. IOSR Journal of Mathematics (IOSR-JM) Vol. 11 Issue 3
  6. Osuji G. A., Obubu M., Obiora-Iluono H. O. (2015), An Investigation On Crime Rate In Southeastern Nigeria. European Journal of Statistics and Probability Vol. 3, NO4, 1 –9.
  7. Printcom (2003). http://support.sas.com/onlinedoc/912/getDoc/common.hlp/ images/copyrite. htm
  8. Rencher, A. C. (2002), Methods of Multivariate Analysis. 2nd edition, John Wiley & Son, New York.
  9. Shehu, U. G., Dikko, H. G., Yusuf. B., (2012), Analysis of Crime data using Principal Component Analysis. A case study of Katsina state ‘CBN Journal of Applied Statistics vol. 3 No 2: 39.
  10. Soren, H. (2006). Example of multivariate analysis in R – Principal Component Analysis (PCA).
  11. Yusuf Bello et al (2014), Principal Component Analysis of Crime Victimizations in Kastina Senatorial Zone ‘International Journal of Science and Technology Vol. 3 No. 4.

Article Tools
  Abstract
  PDF(271K)
Follow on us
ADDRESS
Science Publishing Group
548 FASHION AVENUE
NEW YORK, NY 10018
U.S.A.
Tel: (001)347-688-8931