Principal Component Analysis of Crime Data in Gwagwalada Area Command, Abuja from 1995 – 2015
Nasiru Mukaila Olakorede, Samuel Olorunfemi Adams, Samuel Olayemi Olanrewaju
Department of Statistics, Faculty of Science, University of Abuja, Abuja, Nigeria
Email address:
To cite this article:
Nasiru Mukaila Olakorede, Samuel Olorunfemi Adams, Samuel Olayemi Olanrewaju. Principal Component Analysis of Crime Data in Gwagwalada Area Command, Abuja from 1995 – 2015. American Journal of Theoretical and Applied Statistics. Vol. 6, No. 1, 2017, pp. 38-43. doi: 10.11648/j.ajtas.20170601.15
Received: July 15, 2016; Accepted: July 25, 2016; Published: February 6, 2017
Abstract: This paper analyses Abuja crime data which consists of the averages of twenty major crimes reported to the police for the period 1995 – 2015. Correlation analysis and principal component analysis (PCA) were employed to explain the correlation between the crimes and to determine the distribution of the crimes over the three Area Councils under the Gwagwalada Area Command. The result has shown a significant correlation between robbery and rape, grievous hurt and wound (GHW), theft, assault, murder and unlawful escape. Gwagwalada Area Council has the highest overall crime rate i.e. Rape, Robbery, GHW, Theft, Assault, Murder and unlawful escape in the Area Command. Unlawful possession, breach of peace and broken store are more prevalence in Kwali Area Council while Vehicle theft, car stealing and burglary are more prevalence in Kuje Area Council Area. The PCA has suggested retaining three components (Rape, Robbery and GHW) that explain about 86.873 percent of the total variability of the data set.
Keywords: Multivariate Analysis, Factor Analysis, PCA, GHW, DPHs
1. Introduction
In recent time there has been an increase in the reported cases of crime across the country consequently in the FCT Abuja, thus raising the issue of what categories of crime is committed in the federal capital territory. The scope of crime prevention has grown considerably in the last few years. What was previously the sole concern of the police and the private security industry has spread into areas from real estate developer, car manufacturers, residents' groups, building public facilities like society offices and shopping centers, all these calls for continuously using improved new ways to prevent crime. Criminal victimization has serious consequences for the citizens and society because high standard of living is undermined by high level of criminal victimization [1]. The issue of crime in the city of Abuja has increase since the relocation of the federal capital to Abuja, the statistical analysis of the crime rate in Abuja will be explain and analyzed as Crime is one of the continuous problems that bedevil the existence of mankind. Principal component analysis (PCA) is very useful in crime analysis because of its robustness in data reduction and in determining the overall criminality in a given geographical area. PCA is a data analysis tool that is usually used to reduce the dimensionality (number of variable) of a large number of interrelated variables while retaining as much of the information (variation) as possible.
Literature has shown that previous research on crime data were analyzed using Principal Component Analysis (PCA) [3] Classified a city as safe or unsafe in the US Cities by using multivariate methods of principal components, factor analysis, and discriminant analysis to reduce the 14 distinct variables that can affect the crime rate of a city to 6 and 7 important variables that show a high correlation with all the variables. [11] Principal component analysis did decrease the number of variables to 6 and accounted for 86% of the total variance, while factor analysis decreased the number to 7 and accounted for 79.7% of the total variance. [5] Analyzed Oyo State crime data which consists of 8 major crime reported to the police between the period of 1996 – 2014. They employed correlation analysis and principal component analysis to explain the correlation between the crimes in the state. Their results showed a significant correlation, the principal component analysis has suggested retaining 6 components that explained about 83.79% of the total variability of the data set. [10], in their research employed the method of face-to-face personal interview using a stratified multistage random selection procedure. They applied PCA to analyze the spatial pattern of criminal victimization in the 11 Local Government Areas in Kastina Senatorial Zone. They found out that Batsari has the overall highest average victimization while Rimi has the lowest average victimization in the zone. [8], used katsina state data which consist of the average of eight major crimes reported to the police for the period 2006 – 2008. Correlation analysis and principal component analysis (PCA) were employed to explain the correlation between the crimes and to determine the distribution of the crimes over the local government of the state. The result has shown a significant correlation between robbery, theft and vehicle theft. The PCA has suggested retaining four components that explain about 78.94 percent of the total variability of the data set.
This paper explores the use of correlation analysis and PCA for effective crime control and prevention. The objectives of this research include; to determine the bivariate association existing between pairs of crimes types, to identify the crime that is dominant in FCT Abuja and to reduce the dimension of the data using Principal Component Analysis (PCA).
PCA offers a tool for reducing the dimensionality of a very large data set and in determining the areas with overall crime rate. These if properly implemented, will successively solve many of the complex criminal problems that have bedeviled the country in general and Abuja in particular.
2. Data Description
The crime data for the period 1995 – 2015 for the 3 Divisional Police Headquarters (DPHs) in the Gwagwalada Area Command Abuja was collected officially from the FCT Police Headquarters, Abuja. For easy statistical analysis and interpretation, the 3 Divisional Police Headquarters (DPHs) were categorized according to the three Area Councils (AC) under Gwagwalada Area Command i.e. Gwagwalada, Kuje and Kwali Area Council. The data consists of thirteen major crimes reported to the police within the period 1995 – 2015. The crime include; Rape, robbery, Grievous Hurt and Wound (GHW), theft, Vehicle theft, assault, murder, Burglary car stealing, unlawful possession, breach of peace, broken store and unlawful escape. Frequencies of crimes for each category were averaged over the twenty-one years in the study period to control for anomalous years when there may have been an unexplained spike or fall in crime levels prior to the statistical analysis. The value for each crime was converted to crime rate per 100,000 populations of the Area Council (AC) which was calculated as [4].
3. Methodology
3.1. Principal Component Analysis
Principal component analysis (PCA) is one of the most frequently used multivariate data analysis and it can be consider as a projection method which project observations from a p-dimensional space with p-variable to a k-dimensional space (where K<P) so as to conserve the maximum amount of information (information is measured here through the total variance of the scatter plots) from the initial dimensions. This method transforms a set of highly correlated variable into the new sets which are uncorrelated but contain almost all information in the original data. The new set of variables are often fewer in number than original variables. The features of PCA are; it incorporate no error terms in its structure and hence it is a mathematical procedure, It transforms a set of variable into a new set of uncorrelated variables, It is appropriate when variables are on equal footing.
3.2. Derivation of PCA
Given a P dimensional random vector with mean vector and dispersion matrix ∑, PCA seeks a new set variable (P often fewer than P) so that:
(1)
Where; are coefficients; Thus; Z_{j}’s are linear combination with coefficient and
(2)
And (3)
The procedures try to obtain a_{ij} so that Z_{j}, the j^{th} PC of X will have the following properties
(i) The Z’s are orthogonal
(ii) Each Z capture the maximum variable remaining in X, hence we maximize the variation in Z subject to the constraint
For instance if Z, is the first PC we seek for a^{1}, such that
(i) is a maximum
(ii)
If Z_{2} is the second PC after determining Z_{1}, and it’s uncorrelated with Z_{1}, we seek for a_{2} such that;
(i) is a maximum
(ii)
(iii) that is, Z_{1} and Z_{2} are orthogonal
The procedure continue in this way to select the PC such that;
(i) is a maximum
(ii)
(iii) i.e., Z_{j}’s are orthogonal
3.3. The Procedure of Finding the First pc
To find the first PC, we seek for a_{1}, such that
(4)
is the PC of X subjected to the constant.
(i)
(ii)
To maximize var (Z_{1}) subject to a^{1}_{1}a_{1}, we define the langrangian function.
(5)
λ is the langrangian multiplier.
To maximize L (a_{1}) we differentiate L (a_{1}) partially with respect to a_{1} and equate the result to zero Thus,
(6)
And (7)
Where: λ = Eigen vector of ∑a_{1} is the corresponding Eigen vector of λ
The solution a_{1} = 0 is a trivial solution and since a_{1} cannot be zero (i.e. a_{1} ≠ 0) to have non-trivial solution then
(8)
and implies that
(9)
If (3.7) is to have non-trivial solution, then because of (3.8), λ must be the characteristics root of ∑.
Hence we will have P characteristic root and P a_{i}’s which are vector since ∑ is a P x P dimensional matrix lets λ_{1}, λ_{2} …, λ_{p} be the characteristic roots of ∑, then var (Z_{1}) = a^{1}_{1}λa_{1} = λ hencemax var (Z_{1}), is equivalent to max (λ). That is, if we have Pλ’s we choose the maximum and Var (Z_{1}) = λ_{i}.
3.4. The Procedure for Finding the Second PC
Let Z_{2} be the second PC of X, then .
We seek a_{2} such that;
(i)
(ii)
(iii)
The langrangian function is
(10)
Thus;
(11)
Multiply by to get;
(12)
3.5. Interpretation of the Principal Components
The loading or the eigenvector α_{j }=α_{1, }α_{2}, … α_{p,} is the measure of the importance of a measured variable for a given PC. When all elements are positive, the first component is a weighted average of the variables and is sometimes referred to as measure of overall crime rate. Likewise, the positive and negative coefficients in subsequent components may be regarded as type of crime components [7] and [6]. The plot of the first two or three loadings against each other enhances visual interpretation [9].
The score is a measure of the importance of a PC for an observation. The new PC observations Y_{ij} are obtained simply by substituting the original variables X_{ij} into the set of the first PCs. This gives
Y_{ij}=α^{’}_{j1}X_{i1} + α^{’}_{j2}X_{i2 }+ ….+ α^{’}_{jp}X_{ip} (13)
i=1,2,…., n, j=1,2,…, p
3.6. The Proportion of Variance
The proportion of variance tells us the PC that best explained the original variables. A measure of how well the first q PCs of Z explain the variation is given by:
A cumulative proportion of explained variance is a useful criterion for determining the number of components to be retained in the analysis. A Scree plot provides a good graphical representation of the ability of the PCs to explain the variation in the data [2].
4. Analysis and Results
In this section, we present the results of the analysis of the data on the total number of crime committed yearly from 1995 through 2015 in Gwagwalada Area Command, Abuja, Nigeria. The data was collected from the police headquarters, Abuja.
Rape | Robbery | GHW | Theft | V. Theft | Assault | Murder | ||
Rape | Pearson Correlation | 1 | .883** | .930** | .909** | .304 | .912** | .857** |
Sig. (2-tailed) | .000 | .000 | .000 | .193 | .000 | .000 | ||
Robbery | Pearson Correlation | .883** | 1 | .965** | .925** | .094 | .927** | .767** |
Sig. (2-tailed) | .000 | .000 | .000 | .693 | .000 | .000 | ||
GHW | Pearson Correlation | .930** | .965** | 1 | .980** | .227 | .980** | .822** |
Sig. (2-tailed) | .000 | .000 | .000 | .337 | .000 | .000 | ||
Theft | Pearson Correlation | .909** | .925** | .980** | 1 | .312 | .998** | .803** |
Sig. (2-tailed) | .000 | .000 | .000 | .181 | .000 | .000 | ||
V. Theft | Pearson Correlation | .304 | .094 | .227 | .312 | 1 | .298 | .249 |
Sig. (2-tailed) | .193 | .693 | .337 | .181 | .202 | .290 | ||
Assault | Pearson Correlation | .912** | .927** | .980** | .998** | .298 | 1 | .808** |
Sig. (2-tailed) | .000 | .000 | .000 | .000 | .202 | .000 | ||
Murder | Pearson Correlation | .857** | .767** | .822** | .803** | .249 | .808** | 1 |
Sig. (2-tailed) | .000 | .000 | .000 | .000 | .290 | .000 | ||
Burglary | Pearson Correlation | -.046 | -.089 | -.028 | -.016 | .083 | -.021 | .052 |
Sig. (2-tailed) | .846 | .708 | .908 | .945 | .729 | .928 | .829 | |
C/Stealing | Pearson Correlation | .306 | .101 | .231 | .310 | .998** | .296 | .237 |
Sig. (2-tailed) | .189 | .671 | .327 | .184 | .000 | .205 | .315 | |
Unlawful possession | Pearson Correlation | .055 | .314 | .321 | .322 | -.013 | .314 | .096 |
Sig. (2-tailed) | .818 | .177 | .167 | .166 | .957 | .177 | .687 | |
Breach of Public Peace | Pearson Correlation | -.007 | .129 | .206 | .222 | .070 | .210 | .076 |
Sig. (2-tailed) | .978 | .588 | .384 | .346 | .769 | .374 | .751 | |
Broken Store | Pearson Correlation | -.011 | .054 | .116 | .138 | .280 | .111 | -.078 |
Sig. (2-tailed) | .962 | .822 | .627 | .561 | .232 | .642 | .745 | |
Unlawful Escape | Pearson Correlation | .837** | .985** | .922** | .883** | .115 | .882** | .738** |
Sig. (2-tailed) | .000 | .000 | .000 | .000 | .628 | .000 | .000 |
Burglary | C/ Stealing | Unlawful possession | Breach of Public Peace | Broken Store | Unlawful Escape | ||
Rape | Pearson Correlation | -.046 | .306 | .055 | -.007 | -.011 | .837** |
Sig. (2-tailed) | .846 | .189 | .818 | .978 | .962 | .000 | |
Robbery | Pearson Correlation | -.089 | .101 | .314 | .129 | .054 | .985** |
Sig. (2-tailed) | .708 | .671 | .177 | .588 | .822 | .000 | |
GHW | Pearson Correlation | -.028 | .231 | .321 | .206 | .116 | .922** |
Sig. (2-tailed) | .908 | .327 | .167 | .384 | .627 | .000 | |
Theft | Pearson Correlation | -.016 | .310 | .322 | .222 | .138 | .883** |
Sig. (2-tailed) | .945 | .184 | .166 | .346 | .561 | .000 | |
V. Theft | Pearson Correlation | .083 | .998** | -.013 | .070 | .280 | .115 |
Sig. (2-tailed) | .729 | .000 | .957 | .769 | .232 | .628 | |
Assault | Pearson Correlation | -.021 | .296 | .314 | .210 | .111 | .882** |
Sig. (2-tailed) | .928 | .205 | .177 | .374 | .642 | .000 | |
Murder | Pearson Correlation | .052 | .237 | .096 | .076 | -.078 | .738** |
Sig. (2-tailed) | .829 | .315 | .687 | .751 | .745 | .000 | |
Burglary | Pearson Correlation | 1 | .071 | .226 | .222 | .119 | -.094 |
Sig. (2-tailed) | .767 | .338 | .346 | .616 | .692 | ||
C/Stealing | Pearson Correlation | .071 | 1 | -.006 | .080 | .297 | .119 |
Sig. (2-tailed) | .767 | .982 | .736 | .204 | .617 | ||
Unlawful possession | Pearson Correlation | .226 | -.006 | 1 | .892** | .744** | .310 |
Sig. (2-tailed) | .338 | .982 | .000 | .000 | .183 | ||
Breach of Public Peace | Pearson Correlation | .222 | .080 | .892** | 1 | .849** | .090 |
Sig. (2-tailed) | .346 | .736 | .000 | .000 | .707 | ||
Broken Store | Pearson Correlation | .119 | .297 | .744** | .849** | 1 | .054 |
Sig. (2-tailed) | .616 | .204 | .000 | .000 | .821 | ||
Unlawful Escape | Pearson Correlation | -.094 | .119 | .310 | .090 | .054 | 1 |
Sig. (2-tailed) | .692 | .617 | .183 | .707 | .821 |
**Correlation is significant at the 0.01 level (2-tailed).
Source: Derived from Statistics Department of the Nigeria Police Force, Abuja
The correlation matrix in Table 1 displayed different levels of correlation between the crimes. There is strong positive relationship in between robbery, rape, grievous hurt and wound (GHW), theft, assault, murder, and unlawful escape, unlawful possession and breach of public peace, c/stealing and Vehicle theft. The relationships were also significant at 5% significance level, which means that their variables can be used to predict (explain) one another. Similarly, it was observed that the correlations between robbery and burglary, breach of public peace and broken store, rape and burglary, GHW and burglary, theft and burglary, assault and burglary, murder and broken store, c/stealing and unlawful possession were negative and insignificant.
Kaiser-Meyer-Olkin Measure of Sampling Adequacy | 0.594 |
Bartlett’s Test of Sphericity Approx. Chi-Square | 472.314 |
df | 78 |
Sig. | 0 |
The null hypothesis that the correlation matrix is an identity matrix was rejected at 5% level of significance (Bartlett's test of Sphericity; χ^{2} = 472.314, p-value =.000), this implies that the correlation in the dataset are appropriate for factor analysis. Also, "Kaiser-Meyer-Olkin statistic = 0.594" revealed that adequate sampling is being used for this analysis.
Crime | Initial | Extraction |
Rape | 1.000 | .992 |
Robbery | 1.000 | .995 |
GHW | 1.000 | .991 |
Theft | 1.000 | .965 |
Vehicle theft | 1.000 | .985 |
Assault | 1.000 | .964 |
Murder | 1.000 | .762 |
Burglary | 1.000 | .114 |
C/Stealing | 1.000 | .989 |
Unlawful possession | 1.000 | .929 |
Breach of Public Peace | 1.000 | .937 |
Broken Store | 1.000 | .862 |
Unlawful Escape | 1.000 | .902 |
Extraction Method: Principal Component Analysis.
From table 3, we see that Rape, Robbery and GHW with (.992, .995 and .991) respectively, were best represented in the common factor space, this was because a high proportion of their variances was explained by the principal components.
Crime | Initial Eigenvalues | ||
Eigenvalues | Proportion | Cumulative | |
Rape | 6.663 | 51.251 | 51.251 |
Robbery | 2.686 | 20.664 | 71.916 |
GHW | 1.944 | 14.957 | 86.873 |
Theft | .966 | 7.433 | 94.306 |
Vehicle theft | .314 | 2.415 | 96.720 |
Assault | .192 | 1.478 | 98.198 |
Murder | .155 | 1.196 | 99.394 |
Burglary | .037 | .288 | 99.681 |
C/Stealing | .029 | .222 | 99.904 |
Unlawful possession | .010 | .073 | 99.977 |
Breach of Public Peace | .002 | .013 | 99.990 |
Broken Store | .001 | .008 | 99.998 |
Unlawful Escape | .000 | .002 | 100.000 |
The eigenvalues, proportion and the cumulative proportions of the explained variance are displayed in Table 4. Considering the eigenvalue-one criterion and the Scree plot in figure 1, it would be reasonable to retain the first three PCs i.e. Rape, Robbery and GHW. A commonly accepted rule says that it suffices to keep only PCs with eigenvalues larger than 1, so the first 3 PCs can be retain to explain up to 86.873 percent of the total variability.
Crime | Principal Component | ||
1 | 2 | 3 | |
Rape | .928 | -.252 | .070 |
Robbery | .946 | -.140 | -.223 |
GHW | .986 | -.069 | -.097 |
Theft | .982 | -.034 | -.014 |
Vehicle theft | .338 | .209 | .913 |
Assault | .980 | -.053 | -.024 |
Murder | .846 | -.212 | .024 |
Burglary | -.004 | .332 | .795 |
Car Stealing | .339 | .219 | .909 |
Unlawful possession | .355 | .822 | -.357 |
Breach of Public Peace | .250 | .909 | -.219 |
Broken Store | .186 | .902 | .042 |
Unlawful Escape | .918 | -.143 | -.199 |
Extraction Method: Principal Component Analysis.
a. 3 components extracted.
This table 5 above; concentrated on the three PCs that explains 86.873 per cent of the total variability of the data set are retained.
Component 1: (Gwagwalada) has a positive and negative relationship with all the crimes recorded but majorly it identify, Rape, Robbery, GHW, Theft, Assault, Murder and unlawful escape.
Component 2: (Kwali) Unlawful possession, breach of peace and broken store as the major crimes committed. It has a negative relationship (decrease) with Rape, robbery, Grievous Hurt and Wound (GHW), theft, assault, murder.
Component 3: (Kuje) identify vehicle theft, car stealing and burglary as the concentrated offence. This implies that, Vehicle theft, car stealing and burglary are the daily offences in Kuje.
In figure 2 above, the Area Council at the upper side show tendency towards rape, G. H. W., robbery, Assault, murder, unlawful escape and theft therefore Gwagwalada Area Council has the highest prevalence for these crimes. Kwali Area Council shows tendency towards breach of peace, broken store and unlawful possession are located at the lower part of figure 2 while Kuje Area Council located at the right side shows high tendency toward; vehicle theft, car stealing and burglary. The score plot classified the crimes into groups: (1) the concentrated crimes consisting of rape, GHW, car stealing, robbery, murder and vehicle theft and (2) the less concentrated consisting of Assault, burglary, breach of peace and unlawful possession.
5. Conclusion
The following are the conclusions deduced from the analysis. There is strong positive relationship between robbery and rape, grievous hurt and wound (GHW), theft, assault, murder and unlawful escape, the relationship were also significant, which means that their variables can be used to predict (explain) one another. It was also observed that the correlations in between robbery and burglary, breach of public peace and broken store were negative and insignificant. The Area Council with the highest crime rate is Gwagwalada, Kuje has a moderate crime rate, while Kwali Area Council have the lowest crime rate in Abuja. Three PCs (Rape, Robbery and GHW) that explains about 86.873 per cent of the total variability of the data set are suggested to be retained.
The score plot has classified the crimes into two, namely, (1) concentrated offences: Rape, G. H. W., vehicle theft, theft, Car stealing, robbery, Murder (2) less concentrated: Assault, burglary, burglary, breach of peace and unlawful possession. Base on this, the component has geographically divided Gwagwalada Area Command between the north and south in relation to the crime classifications. The southern parts of A Contain more assault, breach of peace, theft, vehicle theft, while the northern part has the prevalence of the concentrated crimes like rape, GHW, robbery, murder, car stealing. This will help in identifying the distribution of crimes in Gwagwalada, Abuja, allowing the investors to measure the level of risk and to plan how preventive measures for safeguarding their investments.
References