Spatial and Temporal Analysis of Seasonal Traffic Accidents

This paper presents an approach to analyze spatial and temporal (spatiotemporal) patterns of traffic accidents and to organize them according to their level of significance. This approach was tested using three years (2011-2013) of traffic accident data for Sherbrooke. The spatiotemporal patterns of traffic accidents were analyzed using kernel density estimation (KDE) for four different seasons. Two different crash measures were compared: simple crash counts and severity-weighted crash counts. The results show that severity-weighted crash counts reveal the effect of a single fatal/severe injury or light injury crash on the patterns. However, the lack of a significance test is the main drawback of the KDE. Therefore, this paper integrates the KDE with local Moran’s I to identify clusters of statistical significance for traffic accidents within each area. Thus, after the density is calculated by the KDE, it is then applied as the attribute (input value) for calculating local Moran’s I. Our findings show that the method was successful to detect traffic accident clusters and hazardous areas in Sherbrooke.


Introduction
Understanding when and where traffic accidents occur on a road network is one of the most significant questions faced by traffic engineers. According to the World Health Organization [1], about 1.25 million people die each year on the world's roads as a result of road accidents. The annual social cost of road traffic accidents in Canada, in terms of medical treatment, loss of life, rehabilitation and property damage, is estimated to be ten billion dollars [2].
Road traffic accidents are the result of a complex interaction of various environmental and technical factors. Technical risk factors are related to traffic characteristics and volume, which determine risk exposure, and can be managed by monitoring the geometric design of roads. Environmental risk factors, such as weather, have a great impact on the collision rate throughout the year for all climates. They affect road safety in terms of reduced driver visibility, reduced pavement friction and so forth [3]. Therefore, it is crucial for transportation authorities to identify potential hazardous locations and their occurrence time in order to develop strategies to prevent them.
A review of previous studies shows that spatial analysis of traffic accidents has been widely used for investigating hazardous locations [4][5][6][7][8][9][10][11][12]. These studies have evaluated the distribution of traffic accidents based on two categories: distance-based methods and density-based methods. The first group measures the spatial dependence of point events based on the distance of points from each other. It includes techniques such as the nearest neighbor distance, Getis-Ord Gi*, K-function and local Moran`s I [7,13,14,15,16,17]. The second group, using KDE, measures the intensity of point events based on the density in a region. The purpose of KDE is to create a smooth density surface of point events over space by counting the number of crashes at each location as a density estimate. For each point event in the network, a kernel density surface is defined, and the density value is highest at its center and decreases as it moves away from the center [9,18]. This method is suitable for visualizing the crash data as a continuous surface [5,19]. However, it has some limitations. One obvious defect of the method is that only spatial dimension is used as a conditioning variable [20].
Unlike spatial analysis, few studies have been dedicated to the temporal analysis of traffic accidents [21][22][23]. These studies performed temporal analysis to examine the clustering of traffic accidents for different time-series, such as hourly, daily, monthly, and yearly. However, their results were mainly presented as simple line graphs or tables, which do not provide a visual representation of collision clusters over time.
More recently, spatiotemporal analysis has been applied to investigate the spatial and temporal patterns of traffic accidents [23][24][25]. Brunsdon [20] introduced a spatiotemporal method known as comap. In this method, a time period is divided into time-series with similar intervals, and their patterns can then be analyzed and presented using a spatial pattern method like the KDE. This method has been successfully used in other studies, such as for fire incidents and crime mapping. For instance, Asgary et al. [26] applied the comap method to show how the spatial pattern of fire incidents in Toronto varied over time. Plug et al. [25] used this method to investigate the spatiotemporal interaction effect on single vehicle accidents. Their results indicate that the comap method successfully highlights particular locations associated with a high crash density during a particular period.
However, previous studies that have used spatiotemporal methods neglected some important issues in their crash data analysis. Firstly, these studies made no distinction between season-related crashes and treated all types of crashes equally, while the traffic accident distribution fluctuates over the months and seasons of a year. Secondly, the studies did not consider the impact of seasons (different weather conditions) on different levels of crash severity. Weather can increase the severity of crashes through different factors, such as precipitation, strong wind, fog/haze, and freezing rain.
In addition, the other inevitable drawback of the KDE method is that an investigation of the statistical significance of the high-density locations is missing [4,11,25]. Therefore, it is necessary to test the significance (robustness) of clusters more objectively [27,28,29,30]. Local indicators of spatial association (LISA) can be used to examine the significance of clusters. Local Moran's I [31] is the most common type of LISA, which is used to evaluate the statistical significance of the high-density locations for each season.
In this study, traffic accidents were first divided into four subsets according to the season in which they occurred. Second, a weight was assigned to each observed crash based on its severity. Third, the density of traffic accidents, using simple crash counts (Experiment I) and based on the severity of accidents (Experiment II), was computed using KDE. Then, the KDE results (with and without severity) were used as the attribute for calculating local Moran's I.
The aim of this study was to investigate the spatial and temporal patterns of traffic accidents and to test the significance of the clusters. The remainder of the paper is organized as follows. Section two describes the databases and the main methods used in this study. Section three presents the results from applying the KDE and comap to identify season-related hazardous locations. Finally, section four presents the discussion and conclusions of the study.

Data Used in This Study
This study focuses on the city of Sherbrooke, located in the southern Quebec region of eastern-central Canada. Sherbrooke covers an area of approximately 353.5 km , and its population in 2011 was about 154 600 (about 0.4 % of Canada). The study only focuses on urban areas and considers all types of roadways (i.e., local, collectors, and arterial roads) within the city boundary, excluding highways. For the study, two different databases were used from various sources. First, a roadway network base map was obtained from the "Ville de Sherbrooke". The map was provided in a shape file format, which includes roadway specifications such as shape length (segment length), road type, and speed limits. The shape file contains 8327 segments. Second, a three-year (2011-2013) traffic accident database was provided by the "Société de l'assurance automobile du Québec (SAAQ)". During the study period, a total of 7897 collisions were recorded on Sherbrooke's roadways. The accident database was provided in an Excel format and contains significant crash parameters such as the date and time of a collision, accident location, age and sex of drivers, etc. A description of traffic accident types is provided in Table 1, which distinguishes the main categories of crashes. These crashes were then converted into a shape file and mapped using ArcGIS based on their latitude and longitude.
This study only considers vehicle-to-vehicle crashes in the safety analysis, and other types of crashes such as pedestrian and cycling crashes are outside the scope of this study. The study area and distribution of all vehicle crashes are shown in Figure 1.

Kernel Density Estimation
The KDE is one of the most appropriate methods to identify spatial patterns of traffic accidents. It calculates the density of events within a specific bandwidth (search radius) around each point in the study area to generate a smoothed surface. The KDE uses a kernel function to assign a weight to the area surrounding the point event proportional to its distance to the point event. In other words, the surface value is highest at the point location (i.e., the center) and drops smoothly to a value of zero at the radius of the circle (bandwidth). Finally, it generates a smoothed continuous density surface by adding up the individual kernels in the study area [4,13,18,26,32]. The intensity at a specific location is calculated by (Equation 1): where is the density measured at location , is the radius of the circle (bandwidth), K ( ) is the kernel which is a function of the bandwidth and distance, and , is the distance between points s and .
In addition, there are several types of kernel functions, such as quadratic, uniform, Gaussian, trigonometric, etc., but the results of the network KDE are more dependent on the search bandwidth [6,11,18]. Therefore, it is crucial to select an appropriate bandwidth because it will strongly affect the density pattern. If the size of the bandwidth is large, then the estimated density will appear smooth and local details will be obscured. A very small bandwidth, however, will produce a very sharp density pattern (as local spikes) at event locations [33]. Accordingly, the results of both cases may lead to false conclusions.
In previous studies, researchers used an iterative (trial and error) technique to obtain an optimal search bandwidth [4,6,10,25]. This study followed their suggestion, and a bandwidth of 100 m was selected for the analysis of highdensity crash locations.

Comap
Comap is an extension (geographical variant) of a technique known as the co-plot [34]. It is an exploratory graphical approach for examining the relationships between a pair of variables (i.e., the location of traffic accidents) and their variation over time [20]. In this study, it works by subdividing the three-year (2011-2013) aggregated crash data according to the season of the year. Then the density of each subset is analyzed using the KDE. Finally, the results are presented in various maps or plots and arranged successively to show how the spatial distribution of traffic accidents changes over time [25][26]. This study explored the relationship between the spatial distribution of traffic accidents and their variation throughout the seasons of the year. The subdividing process needs to be done carefully because it can lead to artifacts in the results. It has been suggested that each class should have a similar number of traffic accidents, and the class boundaries should overlap each other [20,25,26].
In this study, as shown in Table 2, the crashes were divided into four ordered time intervals (i.e., four seasons). During each time interval, some days overlap to avoid the temporal boundary problem. In addition, the use of the comap method offers some advantages and limitations. The first advantage is that it represents the spatial distribution changes in traffic accidents over time in a single visualization. The second advantage is related to dividing the data into classes of interest (i.e., by time or by cause). One limitation encountered with the method is that it overlaps class boundaries in order to have a similar amount of data in each class [25,35].

Local Moran's I
Local Moran's I [31] is one of the most widely used LISA statistics. It measures the statistical correlation between attributes at each location in a study area and the values (usually the statistical mean) in the neighboring locations and also tests the significance of this similarity [30,36]. Formally, local Moran's I [31] can be expressed as (Equation 2): where ! " is a measure of the spatial weight between regions i and j, #̅ is the mean value, and # ," is the value of the variable at locations i and j.
In general, there are four types of correlation among neighboring values: high-high (H-H), low-low (L-L), highlow (H-L), and low-high (L-H). High-high and low-low indicate that there is a positive autocorrelation, while highlow and low-high show that there is a negative autocorrelation [33]. The high-high areas are relevant for hazardous location detection and show locations where high number of crashes is surrounding with high values [37,38].
An important issue is, how to determine if the measure of spatial autocorrelation is statistically significant. One approach for testing the significance of local Moran's I is a permutation test following a randomization null hypothesis ( ( ) ). In fact, a permutation test consists of randomly reassigning the given attribute values under the null hypothesis and calculating Moran's I value each time [27,36,38]. To evaluate the significance of the observed spatial pattern, the observed value of Moran's I was compared to the randomly simulated distribution to obtain the p-value [31]. In this study, the number of permutations was set to 499, which is applied to each observation. Three significance levels of P < 0.05 (95%), P < 0.01 (99%), and P < 0.001 (99.9%) were used to indicate significant clusters. The GeoDa software was used for the local spatial autocorrelation analysis.

Spatiotemporal Analysis (Experiment I)
In this section, a comap is generated to visualize the spatial and temporal distribution of traffic accidents. This technique helps to determine whether the same hazardous locations are subject to temporal fluctuations in traffic accidents. According to the temporal framework analysis, as shown in Figure 2, the distribution of traffic accidents in Sherbrooke varies over time (i.e., seasons). It is evident that crash patterns are different among the seasons. Crashes are more evenly distributed in the spring (i.e., panel 2) and tend to be more clustered (highlighted in orange and red) and widespread in the summer, fall, and winter (i.e., panels 3, 4, and 1 respectively). According to the spatial framework analysis, as shown in Figure 2, traffic accidents are more uniformly distributed across all four sampling periods, with higher intensities of crashes around the downtown area and along the main roads throughout the city.
The results show that the occurrence of traffic accidents varies in both space and time. The degree of variation appears to be dependent on several significant factors. For instance, in winter, these variations could be due to the weather conditions (as one environmental risk factor), including snow, rain, and freezing rain in general (and specific extreme risks such as "black ice", in particular). Fall is an unpredictable time of the year and the first snow sometimes appears during this period (e.g., in November or earlier). We believe that this is a risky period because many drivers are not yet accustomed to the new weather conditions. It also seems that the crash variations in summer are due to other reasons like driving faster, road construction, etc. In addition, the comap technique can be used for identifying season-related hotspots. Identifying these locations could help transportation authorities and planners to more efficiently allocate their limited budgets and traffic safety resources. This study defined a location as a seasonrelated hotspot if crashes happen frequently only during a certain season. Figure 3 shows these locations highlighted with a black circle. For instance, in winter (see panel 1), crashes frequently occur at the 12e Avenue-King East intersection, while its density is not particularly high in other seasons. In summer (see panel 3), the crash density is high at the College-Queen intersection. In the fall (see panel 4), some clustering occurs at Portland Boulevard, particularly around the Carrefour de l'Estrie shopping center.

Influence of Various Seasons on Severity of Collisions (Experiment II)
One of the objectives of the study was to demonstrate the relationship between season-related crashes and the severity of collisions. To examine this, instead of using the simple crash counts, a weight should be assigned to each observed crash (simple crash count) based on its crash severity. Various weighting factors have been suggested, and this study used the weighting factors proposed by Agent (1973) [39]. Hence, this study assigned weight 1 to property damage only (PDO) crashes, weight 3 to light injury crashes, and weight 9 to serious injury and fatal crashes (fatal/severe injury crashes). Then, the KDE must be calculated for each subset (i.e., season) to demonstrate the spatial distribution of crash severity changes over the seasons. Figure 4 shows the univariate comap results demonstrating the effects of the severity of collisions on crash patterns in each season. The technique was improved by integrating bar graphs with a comap (for each sampling period). The bar graphs were applied to illustrate the effects of a single fatal/severe injury and light injury crash on the crash patterns. As shown, crashes are significantly clustered in summer and fall, followed by winter and spring respectively. As represented in Figure 4, the highest number of fatal/severe injury and light injury crashes (14 and 441 respectively) occurred during the summer. In contrast to Experiment I, where crashes mostly occurred in the downtown area, in Experiment II crashes are more dispersed throughout the city. In fact, crash clusters are mainly in the downtown area, main intersections and along the main routes such as King West, King East, Galt Street, and Portland Boulevard. The high densities at these locations are due to the fatal/severe injury crashes. In addition, a statistical comparison was made between the density values based on simple collision counts (Experiment I) and the severity of collisions (Experiment II). Hence, the density values calculated for all the areas using the KDE is then used as an attribute for computing Moran's I. To identify high-high (H-H) areas for different significance levels, the simulation model was repeated for 499 permutations. The number of statistically significant H-H areas for each experiment at different significance levels is presented in Table 3. The results show that the number of significant areas is higher at a significance level of 0.05 (p<0.05) for both experiments. As illustrated in Table 3, as expected, the number of significant H-H areas is relatively higher in Experiment II than in Experiment I due to the effects of crash severity on crash density patterns. Accordingly, a higher number of significant areas (Experiment II) is shown for the summer, fall, and spring when more fatal/severe injury and light injury crashes occurred on the road networks. The results revealed that the number of significant areas in Experiment I is relatively high in the summer and remained constant in other seasons. In Experiment II, the number of H-H areas is higher in the summer and fall due to a higher number of fatal/severe injury and light injury crashes occurring in these periods.
The spatial distribution of H-H areas (the highlighted red areas) for a significance level of 0.05 for both experiments is shown in Figure 5. Note that the red areas indicate spatial clusters. The overall patterns (both experiments) depict clusters of high traffic accidents (H-H) in the downtown area and along the main roadways.

Discussion
Traffic accidents are one of the leading causes of death worldwide, hence identification of spatiotemporal distribution patterns of traffic accidents and their hotspots/hazardous can help to determine where and when intervention actions should be taken. In comparison with traditional methods for identifying hotspot traffic accident patterns, spatiotemporal analysis can provide a valuable root cause analysis of crash events [26]. This study used spatiotemporal analysis to evaluate and visualize changes in crash density patterns over time (seasons). It also investigated how taking the severity of traffic accidents into consideration can affect the distribution of traffic accident patterns.
The results show that crash patterns vary according to the specific season. This allows transportation planners to focus on specific areas and at a specific time of year. For instance, the number of traffic accident clusters is relatively high in the summer and fall respectively. During these periods, traffic accidents frequently occurred in the downtown area and along the main roadways. Also, the results indicate that the approach is useful for identifying season-related hotspot locations. Finding these locations can help transportation authorities and planners to more efficiently allocate their limited budgets and traffic safety resources. In addition, the effect of taking the crash severity into consideration was examined by comparing the comap results from simple crash counts (Experiment I) with the results using weights based on severity (Experiment II). The comparison showed that Experiment II very clearly revealed the effect of a single fatal or injury crash on the pattern.
The KDE method has been widely used for detecting collision hotspots, although an investigation of the statistical significance of high-density locations is missing. Therefore, it was integrated with local Moran's I (local spatial statistics approach) to detect hotspot locations and to determine which of them are significant. In particular, the density values computed by KDE were used as an attribute in Moran's I to evaluate the significant locations with high-density values. To identify high-high (H-H) areas, two experiments were tested for different significance levels. The results show that Experiment II leads to higher statistically significant H-H areas and clusters.

Conclusion
This study proposed an integrated method to evaluate and analysis the occurrence of traffic accidents through a GIS-based spatial and temporal techniques. The purpose of this research is to investigate the relationship between time (i.e., season) and the location of traffic accidents. The proposed approach is suitable for identifying the cluster pattern of traffic accidents, but some areas still need to be improved. First, in this study, the spatial characteristics of traffic accidents and the severity of accidents were analyzed, whereas there are many other factors associated with traffic accidents. Hence, further study is needed to consider other safety parameters, including road type, traffic volume, household income, etc. Second, this study used an iterative (trial and error) technique to find the most appropriate bandwidth size in the KDE analysis. Therefore, the development of a scientific method for selecting the most appropriate bandwidth size should be considered in future research.